Forum: Ruby Article on screen scraping w HTree+REXML, RubyfulSoup, WWW::

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Peter Szinek (Guest)
on 2006-06-14 12:10
(Received via mailing list)
Hello all,

I am investigating the possibilities of screen scraping/web extraction/
automated web navigation/wrapper generation in Ruby. I have been working
with these technologies for several years, (unfortunately) in Java
and partially C/C++ only. I came to know Ruby a few months ago and I am
 currently investigating the existing tools for the above tasks. Since i
have the feeling that i am not alone (this topic is brought up regularly
here, maybe not as often as the "how to create an Object from it's
name", but it is close to that ;-) I have summarized my findings (tools
that i have found, descriptions, examples, comparison etc.), maybe can
help someone.

http://www.rubyrailways.com/data-extraction-for-we...

You can find simple example solutions of the same problem (scraping
links from a google result page) with regular expressions, HTree+REXML,
RubyfulSoup and WWW::Mechanize.

I am planning to write more entries on this topic, involving screen
scraping from Rails, Gecko to Ruby GTK widget embedding, wrapper
generation etc. Please note that i am new to Ruby so it is possible that
my code snippets are not the most optimal yet (suggestions welcome), but
they are all tested and working.

Feedback/corrections/suggestions would be very much appreciated!

If you liked the story, you can digg it here:

http://www.digg.com/programming/Data_extraction_fo...

Cheers,
Peter
58aa8536f985277ebef53fa931863a3e?d=identicon&s=25 James G. (bbazzarrakk)
on 2006-06-14 14:16
(Received via mailing list)
On Jun 14, 2006, at 5:07 AM, Peter Szinek wrote:

> http://www.rubyrailways.com/data-extraction-for-we...
> scraping-in-rubyrails/

This was a very good article.  Thank you for sharing it with us.

> Please note that i am new to Ruby so it is possible that
> my code snippets are not the most optimal yet (suggestions welcome),

Well, you sometimes declare variables inThisStyle, but Rubyists use
this_style_here.

James Edward Gray II
Peter Szinek (Guest)
on 2006-06-14 16:02
(Received via mailing list)
James Edward Gray II wrote:
> On Jun 14, 2006, at 5:07 AM, Peter Szinek wrote:
>
>> http://www.rubyrailways.com/data-extraction-for-we...
>
> This was a very good article.  Thank you for sharing it with us.

Thx!

> Well, you sometimes declare variables inThisStyle, but Rubyists use
> this_style_here.

Thanks for the suggestion, i'll update it ASAP. (Coming from the Java
camp, that's why the camelsAreStillHaunting ;-)

Peter
This topic is locked and can not be replied to.