scRUBYt! 0.3.1 released

splattael · May 29, 2007, 9:35pm

Hello all,

scRUBYt! version 0.3.1 has been released with a plenty of new features
and bugfixes based on your feedback. Enjoy!

============
What’s this?

scRUBYt! is a very easy to learn and use, yet powerful Web scraping
framework based on Hpricot and mechanize. It’s purpose is to free you
from the drudgery of web page crawling, looking up HTML tags,
attributes, XPaths, form names and other typical low-level web scraping
woes by figuring these out from your examples copy’n’pasted from the Web
page.

===========
What’s new?

[NEW] complete rewrite of the output system, creating
a solid foundation for more robust output functions
(credit: Neelance)
[NEW] logging - no annoying puts messages anymore!
(credit: Tim F.)
[NEW] can index an example - e.g.
link ‘more[5]’
semantics: give me the 6th element with the text ‘link’
[NEW] can use XPath checking an attribute value, like
“//div[@id=‘content’]”
[NEW] default values for missing elements (first version was done in
0.2.8 but it did not work for all cases)
[NEW] possibility to click button with it’s text (instead of it’s index)
(credit: Nick Merwin)
[NEW] clicking radio buttons
[NEW] can click on image buttons (by specifying the name of the button)
[NEW] possibility to extract an URL with one step, like so:
link ‘The Difference/@href’
i.e. give me the href attribute of the element matched by the
example ‘The Difference’
[NEW] new way to match an element of the page:
div ‘div[The Difference]’
means ‘return the div which contains the string “The
Difference”’.
This is useful if the XPath of the element is non-constant across
the same site (e.g.sometimes a banner or add is added, sometimes
not etc.)
[NEW] Clicking image maps; At the moment this is achieved by specifying
an index, like
click_image_map 3
which means click the 4th link in the image map
[FIX] Replacing \240 ( ) with space in the preprocessing phase
automatically
[FIX] Fixed: correctly downloading image if the src
attribute had a leading space, as in

[FIX] Other misc fixes - a ton of them!

========
Comments

The win32 version is just being built as I am writing this, so it will
be available soon.

Please keep the feedback coming - bug reports, questions, suggestions
are warmly welcome at the scRUBYt! forum - http://agora.scrubyt.org.

Cheers,
The scRUBYt! team - http://scrubyt.org

splattael · May 29, 2007, 9:48pm

Waiting eagerly for the window version
al_batuul

Peter S. [email protected] wrote: Hello all,

scRUBYt! version 0.3.1 has been released with a plenty of new features
and bugfixes based on your feedback. Enjoy!

============
What’s this?

scRUBYt! is a very easy to learn and use, yet powerful Web scraping
framework based on Hpricot and mechanize. It’s purpose is to free you
from the drudgery of web page crawling, looking up HTML tags,
attributes, XPaths, form names and other typical low-level web scraping
woes by figuring these out from your examples copy’n’pasted from the Web
page.

===========
What’s new?

[NEW] complete rewrite of the output system, creating
a solid foundation for more robust output functions
(credit: Neelance)
[NEW] logging - no annoying puts messages anymore!
(credit: Tim F.)
[NEW] can index an example - e.g.
link ‘more[5]’
semantics: give me the 6th element with the text ‘link’
[NEW] can use XPath checking an attribute value, like
“//div[@id=‘content’]”
[NEW] default values for missing elements (first version was done in
0.2.8 but it did not work for all cases)
[NEW] possibility to click button with it’s text (instead of it’s index)
(credit: Nick Merwin)
[NEW] clicking radio buttons
[NEW] can click on image buttons (by specifying the name of the button)
[NEW] possibility to extract an URL with one step, like so:
link ‘The Difference/@href’
i.e. give me the href attribute of the element matched by the
example ‘The Difference’
[NEW] new way to match an element of the page:
div ‘div[The Difference]’
means ‘return the div which contains the string “The Difference”’.
This is useful if the XPath of the element is non-constant across
the same site (e.g.sometimes a banner or add is added, sometimes
not etc.)
[NEW] Clicking image maps; At the moment this is achieved by specifying
an index, like
click_image_map 3
which means click the 4th link in the image map
[FIX] Replacing \240 ( ) with space in the preprocessing phase
automatically
[FIX] Fixed: correctly downloading image if the src
attribute had a leading space, as in

[FIX] Other misc fixes - a ton of them!

========
Comments

The win32 version is just being built as I am writing this, so it will
be available soon.

Please keep the feedback coming - bug reports, questions, suggestions
are warmly welcome at the scRUBYt! forum - http://agora.scrubyt.org.

Cheers,
The scRUBYt! team - http://scrubyt.org

scRUBYt! 0.3.1 released

============ What’s this?

=========== What’s new?

======== Comments

============ What’s this?

=========== What’s new?

======== Comments

============
What’s this?

===========
What’s new?

========
Comments

============
What’s this?

===========
What’s new?

========
Comments