Using Nokogiri to insert a <span> tag into existing text

All,

I’m using Nokogiri to handle the following problem:

I have a piece of HTML, and for certain text nodes, I need to insert
tags into the text of these nodes at a certain place.

What I am doing is finding the place where I want to insert the ,
let’s say index X of the text node’s text, and doing the following:

  1. Setting the node’s text to just what is before index X
  2. Adding the as a next sibling to the original node
  3. Adding another next sibling to the original node that is another text
    node, whose contents are the rest of the text in the original node.

I can pass a string with “blah” to Node#add_next_sibling
handle #2.

I’m having trouble with creating a new text node and passing it to
Node#add_next_sibling though.

  1. Does anyone have an example of creating a new text node in
    Nokogiri?

  2. Is there a simpler way to do this than splitting up one text node
    into 3 nodes?

Many thanks,
Wes

On Feb 19, 2011, at 7:39 PM, Wes G. wrote:

Node#add_next_sibling though.

  1. Does anyone have an example of creating a new text node in
    Nokogiri?

  2. Is there a simpler way to do this than splitting up one text node
    into 3 nodes?

Yes. If you have a handle to that text node already, simply use the
content= method to write your new html into it as text. Build that
text up using a regular expression or concatenation in normal Ruby
text processing mode. As long as you don’t need to further modify that
node as if it was a nodeset, this will be the simplest method I can
think of.

If you later need to access that span as a new Nokogiri node, you will
have to do something more complex.

Walter

Thanks Walter, ended up with this:

doc = Nokogiri::HTML.parse(File.new(merge_path))
nodes = doc.xpath("//text()[contains(.,‘MERGE’)]")
nodes.each do |node|
text = node.text
if md = text.match(/.{2}MERGE(\d+).{2}/)
start_index = text.index(md[0])
start_span_tag = “
end_index = start_index + start_span_tag.length + md[0].length
node.content = text.insert(start_index,
start_span_tag).insert(end_index, ‘
’)
end
end

Works great.

Wes

This doesn’t work - when I write this back out to a file the is
escaped.

Perhaps I should have mentioned that I needed to re-serialize the
resulting HTML.

Wes

Node.content= takes text, not more nodes. Try using Node.inner_html =
instead.

Walter

I tried Node.inner_html= to no avail, setting it to the string that
resulted if I interpolated the where I wanted it. Not sure why
it didn’t work, but the replace/after works, so I went with that.

Thanks for the help.

Wes

This works:

#Surround all of the text (NOT attribute value) merge fields with
tags for ease of manipulation later

doc = Nokogiri::HTML.parse(html)
nodes = doc.xpath("//text()[contains(.,‘MERGE’)]")
nodes.each do |node|
text = node.text.dup
if md = text.match(/.{2}MERGE(\d+).{2}/)
start_index = text.index(md[0])
end_index = start_index + md[0].length
node = node.replace(text[0…start_index - 1])[0]
node.after("#{md[0]}#{text[end_index…-1]}")
end
end

That’s because Node.inner_html takes nodes, not strings as input. Glad
you got it working.

Walter