I have a few questions about parsing HTML:
The default docs (rdoc) for HTMLParser (the one that comes with the
Win32 binary distribution) in Ruby are very poor. Where can I find
some good documentation of the module, or better yet a tutorial /
Another question: is HTMLParser built after Perl’s HTML::Parser ?
Can someone suggest which is the best parser to tokenize and build
a tree of the HTML document ? Hpricot looks like a nice parser and is
well documented, but I’m not sure it’s suitable.
Thanks in advance