Daniel A. wrote:
Sorry for the late reply.
I’m surprised no one mentioned RubyfulSoup:
If I understand your problem correctly, it’s exactly what you need: a
forgiving html parser.
I recently tried using RubyfulSoup to parse a Web page, and it had some
peculiar behavior, such as stripping all attributes. Either I was not
using it correctly, or it was a bit too casual in making sense of the
I ended up using some crude string parsing to extract just the subset of
the page I wanted, which gave me well-formed XML suitable for REXML
manipulation. I got a phenomenal speed increase from that as well;
RubyfulSoup seems quite slow.
http://www.ruby-doc.org - Ruby Help & Documentation
http://www.artima.com/rubycs/ - Ruby Code & Style: Writers wanted
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools