I have a HTML document which I need to parse into a tree. I was
looking for a suitable module to use, and came to two:
Hpricot, while it looks like a nice and fast HTML parser, it
appears that its interface is completely unsuitable for parsing a HTML
file into a tree. Is this possible and I’m missing something ?
htree - looks closer to what I need, but it documentation is very
poor (almost inexistent). When I ‘pp’ a parsed HTree document I see a
representation, but how can I actually traverse the tree ? At the
moment I’m using reflection to “dissasemble” the htree tree structure.
There must be a better way ! Can someone please provide an example of
how to recursively print out the tree, telling for each node what kind
of node it is ?
Are there other options ?
Thanks in advance