Hpricot

Good morning to you all,

I need to use Hpricot to search through HTML, find links, and search
through those links.

I’ve got as far as finding all the links
links = page.search(“/html/body//a”)

How can I search through the links efficiently? For example, if I only
want to find links that point to http://www.joe.net, how can I do this
with Hpricot?

I also want to see if the link is nofollow or not.

Basically, to make a long question short, how can I search for certain
HTML inside the Hpricot elements?

Sincerely,
Joe

I also want to see if the link is nofollow or not.

Basically, to make a long question short, how can I search for certain
HTML inside the Hpricot elements?

http://code.whytheluckystiff.net/hpricot/wiki/HpricotChallenge#WildcardinAttributeSearch

Should get you started.

On Wed, Aug 06, 2008 at 04:55:00PM +0200, Joe P. wrote:

want to find links that point to http://www.joe.net, how can I do this
with Hpricot?

I also want to see if the link is nofollow or not.

Basically, to make a long question short, how can I search for certain
HTML inside the Hpricot elements?

This is not just an RTFM reply, but a pointer to the docs that will help
you out:

http://code.whytheluckystiff.net/hpricot/wiki/AnHpricotShowcase

Sincerely,
Joe
–Greg

Since you are using XPath, you can try with something like

links = page.search(’/html/body//a[@href=http://www.joe.net]’)

Explanation: The brakets let you specify a filter for your search and
you can use @ to access an attribute inside an element.

Also, for more complicated stuff you can just traverse the array that
the method search returns like so:

links.each do |link|
if LINK MEETS CRITERIA do stuff
end