Easy way for a Nuub to get link-element from a html-source

strube · November 26, 2007, 11:23am

hi all.

im very new to ruby and im not sure how to do this the easiest way in
ruby. i want to read the content from e.g. “www.spiegel.de” and just
this line

and from this line the “title” and the “href”

since the order in “link” is not sure, it doesnt look like regexp is the
first choice. and i couldn’t find a HTML::Parse.

strube · November 26, 2007, 11:57am

Marcus S. wrote:

since the order in “link” is not sure, it doesnt look like regexp is the
first choice. and i couldn’t find a HTML::Parse.

Check out hpricot.

http://code.whytheluckystiff.net/hpricot/

Regards,
Lee

strube · November 26, 2007, 12:18pm

On 26.11.2007, at 11:23, Marcus S. wrote:

and from this line the “title” and the “href”

since the order in “link” is not sure, it doesnt look like regexp is
the
first choice. and i couldn’t find a HTML::Parse.

How about hpricot?

http://code.whytheluckystiff.net/hpricot/

Kai Brust

strube · November 26, 2007, 12:38pm

How about hpricot?

http://code.whytheluckystiff.net/hpricot/

ok, hpricot then.

is it just

gem install hpricot ??

or do i need to install this “ragel”-thing too?? (and if so which which
is the best way to do so??)

strube · November 26, 2007, 2:24pm

Another possibility is scRUBYt!:

That looks good. That looks good. Thank you!

strube · November 26, 2007, 12:48pm

Marcus S. wrote:

since the order in “link” is not sure, it doesnt look like regexp is the
first choice. and i couldn’t find a HTML::Parse.

Another possibility is scRUBYt!:

==========================================
require ‘rubygems’
require ‘scrubyt’

feed_data = Scrubyt::Extractor.define do
fetch ‘http://www.spiegel.de/’

link “//link[@rel=‘alternate’]” do
title “title”, :type => :attribute
href “href”, :type => :attribute
end
end

puts feed_data.to_xml

output:

==========================================

SPIEGEL ONLINE als RSS-Feed http://www.spiegel.de/schlagzeilen/rss/index.xml ==========================================

or, to_hash:

==========================================
[{:title=>“SPIEGEL ONLINE als RSS-Feed”,
:href=>“DER SPIEGEL - Schlagzeilen”}]

Cheers,
Peter

http://www.rubyrailways.com
http://scrubyt.org

strube · November 26, 2007, 2:58pm

Marcus S. wrote:

Another possibility is scRUBYt!:

That looks good. That looks good. Thank you!

Hm yeah, but the downside (as of the recent version - it’ll be fixed in
the next one) is that the installation process is somewhat… hmm… not
that easy (mainly if you are on win32). If you still decide to go for
scRUBYt!, we can talk on #scrubyt @ irc.freenode.net or you can ask your
questions in the forum (http://agora.scrubyt.org).

Cheers,
Peter

http://www.rubyrailways.com
http://scrubyt.org

Easy way for a Nuub to get link-element from a html-source

puts feed_data.to_xml

========================================== [{:title=>“SPIEGEL ONLINE als RSS-Feed”, :href=>“DER SPIEGEL - Schlagzeilen”}]

==========================================
[{:title=>“SPIEGEL ONLINE als RSS-Feed”,
:href=>“DER SPIEGEL - Schlagzeilen”}]