Nokogiri: parsing tags

introvert · November 11, 2009, 1:27pm

Hello,

I’m trying to use nokogiri gem to parse individual xhtml tags (not the
whole html document) and preform some processing on them.

Heres an example of a string that I want to process:

str = ’
some texttest …’

The following code will add html/body and head tags, and also document
type which I dont want (I know I could go though html->body children of
the root node but I suspect there is some better way to get expected
part with NG):

f = Nokogiri::HTML(str)
f.search(’//img’).each do |url|
#some processing
end
puts f

If I try to use XML fragment:

f = Nokogiri::XML.fragment(str)
f.search(’//img’).each do |node|
#node.remove
end
puts f

The code wont parse the html string but it will print it without adding
standard tags to it.

What am I doing wrong?

Many thanks for help!

introvert · November 12, 2009, 9:39am

Use Nokogiri::HTML.fragment(str).

introvert · November 12, 2009, 7:47pm

G_ F_ wrote:

Use Nokogiri::HTML.fragment(str).

If I use this it wont parse the image tags. Any idea?