All,
I’m using Nokogiri to handle the following problem:
I have a piece of HTML, and for certain text nodes, I need to insert
tags into the text of these nodes at a certain place.
What I am doing is finding the place where I want to insert the ,
let’s say index X of the text node’s text, and doing the following:
Setting the node’s text to just what is before index X
Adding the as a next sibling to the original node
Adding another next sibling to the original node that is another text
node, whose contents are the rest of the text in the original node.
I can pass a string with “blah ” to Node#add_next_sibling
handle #2 .
I’m having trouble with creating a new text node and passing it to
Node#add_next_sibling though.
Does anyone have an example of creating a new text node in
Nokogiri?
Is there a simpler way to do this than splitting up one text node
into 3 nodes?
Many thanks,
Wes
weyus
February 20, 2011, 5:43pm
2
On Feb 19, 2011, at 7:39 PM, Wes G. wrote:
Node#add_next_sibling though.
Does anyone have an example of creating a new text node in
Nokogiri?
Is there a simpler way to do this than splitting up one text node
into 3 nodes?
Yes. If you have a handle to that text node already, simply use the
content= method to write your new html into it as text. Build that
text up using a regular expression or concatenation in normal Ruby
text processing mode. As long as you don’t need to further modify that
node as if it was a nodeset, this will be the simplest method I can
think of.
If you later need to access that span as a new Nokogiri node, you will
have to do something more complex.
Walter
weyus
February 25, 2011, 4:58am
3
Thanks Walter, ended up with this:
doc = Nokogiri::HTML.parse(File.new(merge_path))
nodes = doc.xpath("//text()[contains(.,‘MERGE’)]")
nodes.each do |node|
text = node.text
if md = text.match(/.{2}MERGE(\d+).{2}/)
start_index = text.index(md[0])
start_span_tag = “”
end_index = start_index + start_span_tag.length + md[0].length
node.content = text.insert(start_index,
start_span_tag).insert(end_index, ‘ ’)
end
end
Works great.
Wes
weyus
February 25, 2011, 5:12am
4
This doesn’t work - when I write this back out to a file the is
escaped.
Perhaps I should have mentioned that I needed to re-serialize the
resulting HTML.
Wes
weyus
February 25, 2011, 5:45am
5
Node.content= takes text, not more nodes. Try using Node.inner_html =
instead.
Walter
weyus
March 1, 2011, 4:03am
6
I tried Node.inner_html= to no avail, setting it to the string that
resulted if I interpolated the where I wanted it. Not sure why
it didn’t work, but the replace/after works, so I went with that.
Thanks for the help.
Wes
weyus
February 25, 2011, 6:50am
7
This works:
#Surround all of the text (NOT attribute value) merge fields with
tags for ease of manipulation later
doc = Nokogiri::HTML.parse(html)
nodes = doc.xpath("//text()[contains(.,‘MERGE’)]")
nodes.each do |node|
text = node.text.dup
if md = text.match(/.{2}MERGE(\d+).{2}/)
start_index = text.index(md[0])
end_index = start_index + md[0].length
node = node.replace(text[0…start_index - 1])[0]
node.after("#{md[0]} #{text[end_index…-1]}")
end
end
weyus
March 1, 2011, 6:46am
8
That’s because Node.inner_html takes nodes, not strings as input. Glad
you got it working.
Walter