Hpricot and path of an elememt

alex-osu3 · August 10, 2008, 8:38pm

Hi all,

I use hpricot to load a page. Then I try to find the path for an
element “font”() in the page. Here is
the tutorial
(http://code.whytheluckystiff.net/hpricot/wiki/HpricotBasics):

doc.at(“#header”).xpath
#=> “//div[@id=‘header’]”

here is my code:
puts doc.at(“#font”).xpath

When I run the code Ruby complains undefined method for xpath. I wonder
if I have problem understanding the tutorial.

Thanks,

Li

alex-osu3 · August 10, 2008, 9:05pm

On Sunday 10 August 2008 13:36:42 Li Chen wrote:

I use hpricot to load a page. Then I try to find the path for an
element “font”() in the page.

So, you probably want:

(doc / ‘font’)

doc.at("#header").xpath
#=> “//div[@id=‘header’]”

Right, that’s searching for a tag that looks like this:

here is my code:
puts doc.at("#font").xpath

And that’s searching for a tag that looks like this:

If you’re following that example, you probably want:

puts doc.at(‘font’).xpath

Now, first question: Why do you need the xpath? Usually, the idea is to
try to
find that element, and then do something with it. So, for example:

To return all text:

(doc / ‘font’).text

To loop over each font element:

(doc / ‘font’).each { |tag|
puts tag.inner_text
}

Second question: Why is there a font tag on this page? If you had any
hand in
creating the page, shame on you – go learn some CSS.

In fact, go learn some CSS anyway. Hpricot supports both CSS selectors
and
XPath, and it’s usually much easier to use the selectors. Years later, I
still remember, roughly, how selectors work – but only a few months
later,
I’ve almost completely forgotten XPath.

There are things XPath can do that selectors can’t. But until you
encounter
them, XPath is overkill.

alex-osu3 · August 11, 2008, 3:01pm

David M. wrote:

Now, first question: Why do you need the xpath? Usually, the idea is to
try to
find that element, and then do something with it. So, for example:

To return all text:

(doc / ‘font’).text

To loop over each font element:

(doc / ‘font’).each { |tag|
puts tag.inner_text
}

I need to extract text within this tag. I follow you code and I find

(doc/‘font’).text and (doc/‘font’).html return the same results
when I run (doc / ‘font’).each { |tag| puts tag.inner_text}
Ruby complains it:
undefined method `inner_text’ for #Hpricot::Elem:0x2e9f9c4
(NoMethodError)

so I change it to tag.inner_html and it works. I check the document
about hpricot and find the methode #inner_text is there. But I cannot
figure out why Ruby complains about it.

Second question: Why is there a font tag on this page? If you had any
hand in
creating the page, shame on you – go learn some CSS.

I am a newbie on HTML and website development. If you want to know why
there is a font tag in the page, please check this out:
http://www.ensembl.org/Homo_sapiens/exonview?db=core;transcript=ENST00000356766

What I try to do is to extract some info I am interested from this
page. I have no idea why they put this tag and that tag there. I don’t
think it is my priority to know somany whys now. I am more concerned
about letting the job done.

Anyway thank very much for the tips.

Li