Hpricot question

farfignugen · July 31, 2006, 3:58pm

I’m trying to use Hpricot to clean up the text in a big site full of
old-style HTML. I’m just trying to do things like replacing literal
quote characters with and . I’m hampered by the fact that my
understanding of the HTML DOM comes from reading one web site
yesterday and I don’t know any javascript. Nonetheless, it seems that
Hpricot should be able to easily give me all the text in the
element of each page because it has a traverse_text() method. The
problem seems to be that if I apply it to a whole page, I get the
text in the element and all the methods for selecting seem to
return an element, not a tree.

There is a get_subnode method but it doesn’t seem to work as expected.

Thanks in advance for any help

–
The folly of mistaking a paradox for a discovery, a metaphor for a
proof, a torrent of verbiage for a spring of capital truths, and
oneself for an oracle, is inborn in us.
-Paul Valery, poet and philosopher (1871-1945)

farfignugen · August 1, 2006, 4:43am

On Jul 31, 2006, at 6:17 AM, Chris G. wrote:

There is a get_subnode method but it doesn’t seem to work as expected.

Nevermind,

The reason get_subnode gives:
…hpricot/traverse.rb:23:in get_subnode': undefined methodget_subnode_internal’ for #Hpricot::Doc:0x5c182c

is because Why literally hasn’t written get_subnode_internal yet.
maybe I’ll try to write it when/if i get some time.

–
For blocks are better cleft with wedges,
Than tools of sharp or subtle edges,
And dullest nonsense has been found
By some to be the most profound.
-Samuel Butler,