From a URL to XPath 2.0

evansenter · February 20, 2008, 10:31pm

Hi,

I am trying to write a small script that allows me to scrape HTML using
XPath 2.0. As much as I enjoyed using hPricot, it’s lack of support for
indexed paths has forced me to look to a different tool (I’ve heard
REXML has the best XPath support). In order to use REXML however, I need
to first convert the HTML to XML and I’m yet to find a good gem / plugin
to do that.

As I mentioned however, my main interest is having index support for
XPath queries against an HTML page arbitrarily pulled from a generated
URL. Anyone know of a good approach to handle this?

Thank you,

Ruby.new(user)

evansenter · February 21, 2008, 6:04am

Evan Senter wrote:

As I mentioned however, my main interest is having index support for

Hi, you might want to try HTML tidy

project : http://tidy.sourceforge.net/
try it online (output XML): HTML Tidy Online