I am trying to write a small script that allows me to scrape HTML using
XPath 2.0. As much as I enjoyed using hPricot, it’s lack of support for
indexed paths has forced me to look to a different tool (I’ve heard
REXML has the best XPath support). In order to use REXML however, I need
to first convert the HTML to XML and I’m yet to find a good gem / plugin
to do that.
As I mentioned however, my main interest is having index support for
XPath queries against an HTML page arbitrarily pulled from a generated
URL. Anyone know of a good approach to handle this?