Forum: Ruby scRUBYt! 0.3.1 released

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
F50f5d582d76f98686da34917531fe56?d=identicon&s=25 Peter Szinek (Guest)
on 2007-05-29 21:35
(Received via mailing list)
Hello all,

scRUBYt! version 0.3.1 has been released with a plenty of new features
and bugfixes based on your feedback. Enjoy!

============
What's this?
============

scRUBYt! is a very easy to learn and use, yet powerful Web scraping
framework based on Hpricot and mechanize. It's purpose is to free you
from the drudgery of web page crawling, looking up HTML tags,
attributes, XPaths, form names and other typical low-level web scraping
woes by figuring these out from your examples copy'n'pasted from the Web
page.

===========
What's new?
===========

[NEW] complete rewrite of the output system, creating
       a solid foundation for more robust output functions
       (credit: Neelance)
[NEW] logging - no annoying puts messages anymore!
       (credit: Tim Fletcher)
[NEW] can index an example - e.g.
       link 'more[5]'
       semantics: give me the 6th element with the text 'link'
[NEW] can use XPath checking an attribute value, like
       "//div[@id='content']"
[NEW] default values for missing elements (first version was done in
       0.2.8 but it did not work for all cases)
[NEW] possibility to click button with it's text (instead of it's index)
       (credit: Nick Merwin)
[NEW] clicking radio buttons
[NEW] can click on image buttons (by specifying the name of the button)
[NEW] possibility to extract an URL with one step, like so:
       link 'The Difference/@href'
       i.e. give me the href attribute of the element matched by the
       example 'The Difference'
[NEW] new way to match an element of the page:
       div 'div[The Difference]'
       means 'return the div which contains the string "The
Difference"'.
       This is useful if the XPath of the element is non-constant across
       the same site (e.g.sometimes a banner or add is added, sometimes
       not etc.)
[NEW] Clicking image maps; At the moment this is achieved by specifying
       an index, like
       click_image_map 3
       which means click the 4th link in the image map
[FIX] Replacing \240 ( ) with space in the preprocessing phase
       automatically
[FIX] Fixed: correctly downloading image if the src
       attribute had a leading space, as in
       <img src=' /files/downloads/images/image.jpg'/>
[FIX] Other misc fixes - a ton of them!

========
Comments
========

The win32 version is just being built as I am writing this, so it will
be available soon.

Please keep the feedback coming - bug reports, questions, suggestions
are warmly welcome at the scRUBYt! forum - http://agora.scrubyt.org.

Cheers,
The scRUBYt! team - http://scrubyt.org
C277726055c0398b324248dd91727df1?d=identicon&s=25 al_batuul (Guest)
on 2007-05-29 21:48
(Received via mailing list)
Waiting eagerly for the window version
al_batuul

Peter Szinek <peter@rubyrailways.com> wrote:  Hello all,

scRUBYt! version 0.3.1 has been released with a plenty of new features
and bugfixes based on your feedback. Enjoy!

============
What's this?
============

scRUBYt! is a very easy to learn and use, yet powerful Web scraping
framework based on Hpricot and mechanize. It's purpose is to free you
from the drudgery of web page crawling, looking up HTML tags,
attributes, XPaths, form names and other typical low-level web scraping
woes by figuring these out from your examples copy'n'pasted from the Web
page.

===========
What's new?
===========

[NEW] complete rewrite of the output system, creating
a solid foundation for more robust output functions
(credit: Neelance)
[NEW] logging - no annoying puts messages anymore!
(credit: Tim Fletcher)
[NEW] can index an example - e.g.
link 'more[5]'
semantics: give me the 6th element with the text 'link'
[NEW] can use XPath checking an attribute value, like
"//div[@id='content']"
[NEW] default values for missing elements (first version was done in
0.2.8 but it did not work for all cases)
[NEW] possibility to click button with it's text (instead of it's index)
(credit: Nick Merwin)
[NEW] clicking radio buttons
[NEW] can click on image buttons (by specifying the name of the button)
[NEW] possibility to extract an URL with one step, like so:
link 'The Difference/@href'
i.e. give me the href attribute of the element matched by the
example 'The Difference'
[NEW] new way to match an element of the page:
div 'div[The Difference]'
means 'return the div which contains the string "The Difference"'.
This is useful if the XPath of the element is non-constant across
the same site (e.g.sometimes a banner or add is added, sometimes
not etc.)
[NEW] Clicking image maps; At the moment this is achieved by specifying
an index, like
click_image_map 3
which means click the 4th link in the image map
[FIX] Replacing \240 ( ) with space in the preprocessing phase
automatically
[FIX] Fixed: correctly downloading image if the src
attribute had a leading space, as in

[FIX] Other misc fixes - a ton of them!

========
Comments
========

The win32 version is just being built as I am writing this, so it will
be available soon.

Please keep the feedback coming - bug reports, questions, suggestions
are warmly welcome at the scRUBYt! forum - http://agora.scrubyt.org.

Cheers,
The scRUBYt! team - http://scrubyt.org
This topic is locked and can not be replied to.