Correcting HTML in Ruby

Hello,

I need to get content out of files which have incorrect HTML.
Is there any library that does that?

I have been looking for a HTML->XML library but all attempts I have made
have been futile.

Thanks for your time,

Roland

On 6/21/06, Roland M. [email protected] wrote:

I need to get content out of files which have incorrect HTML.
Is there any library that does that?

It’s not exactly what you’re looking for but the W3C has a tool called
HTML Tidy [1] which may be of help. It can fix a lot of brain damaged
HTML, and even does wonders for the horrible HTML generated by MS
Word.

[1] Clean up your Web pages with HTML TIDY

On Jun 21, 2006, at 16:53, Roland M. wrote:

Hello,

I need to get content out of files which have incorrect HTML.
Is there any library that does that?

I have been looking for a HTML->XML library but all attempts I have
made
have been futile.

There was an article on doing something like this posted to the list
a short while back:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/197214

It seems to give a good overview of a few different solutions.

best of luck,
matthew smillie.