First I’ll mention I have used the search function and found some useful
topics, but I still don’t really find a solution due to a lack of Ruby
and Hpricot/Xpath knowlegde.
The problem is the following: from http://users.telenet.be/weerstation.drongen/index.htm/Current_Vantage_Pro.htm
I need to scrape the temperature and Today’s Rain values (need those for
Engineering Project). With Xpather and Firebug I looked up the Xpath to
the Temperature values:
/html/body/table/tbody/tr[3]/td[2]/font/strong/small/font (as Xpather
says so).
But when I try to print the value in Ruby, I got nil.
doc = Hpricot(@response)
puts (doc/"#{xpath}").inner_html
rescue Exception => e
puts e
end
Since this returned nil, I decided to look up where I got nil returned.
Apparently /html/body/table/tbody is too far, because /html/body/table
still returns an output and tbody returns nil.
I’ve read that I should try to rebuild the path now, but I really don’t
find a way how to do this. This is only my second serious Ruby script
(only the beginning actually) and the first time I used Hpricot.
I’m looking forward to replies, and I’m sorry to bother you with yet
another Hpricot-nil topic, but I’m kinda hopeless because of my
deadline…
It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = “/html/body/table//tr[3]/td[2]/font/strong/small/font”
It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = “/html/body/table//tr[3]/td[2]/font/strong/small/font”
There is more to it than “tbody does not work for hpricot”.
When a HTML parser (Firefox and Hpricot in this case) parses a HTML
page, it has to build a tree from it (a.k.a. DOM).
The problem is that a lot (most?) of the HTML out there is badly
formatted, so the process of DOM building is very ambiguous (what if
tags are not nested properly? tags that are never closed? and a lot of
other problems) so every parser approaches it a bit differently
(that’s one reason why you have the ‘works in IE but not in FF’ kind
of problems), and e.g. Firefox even makes some efforts to make the
parsed HTML standards compliant - for example inserting a tbody tag
after a table tag if it’s missing.
However, this is but only very small difference between how Hpricot
and Firefox parses the HTML/builds the DOM tree (on which XPaths are
evaluated) - Hpricot tries to be as close to FF as possible, but this
doesn’t always happen (though _why said he considers these cases bugs).
Bottom line: you can’t expect that XPath yanked from FireBug will work
with Hpricot/Mechanize (though it mostly does, and adding a tbody
increases your chances even further).
It should work if you take the tbody off the xpath. I have read
somewhere that tbody does not work for hpricot , I dont know Y .
Gudluck.
xpath = “/html/body/table//tr[3]/td[2]/font/strong/small/font”
I’ll try it in a minute, thank you for the answer.
@Peter, thank you for the very complete explanation.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.