(RE)XML question

Question for you all. I want to treat HTML like XML
(which is no big deal).

But I want to find certain “special” tags (not real
HTML) and replace them with my own text.

It’s macro-type stuff. Basically I want to output
the same HTML except for the text that replaced
the special tags.

I can’t find any examples of generating XML with
REXML. It should be easy, I don’t want it to be
too hard.

Contrived example below in case it helps.

How would you do this?

Thanks,
Hal

Input:

Hi, there.

some more text

That's all.

Output:

Hi, there.

I found a foo tag enclosing 'some more text' with bar and bam values of 'this' and 'that'...

That's all.

On Wed, Aug 23, 2006 at 07:15:09AM +0900, [email protected] wrote:

Output:

Hi, there.

I found a foo tag enclosing 'some more text' with bar and bam values of 'this' and 'that'...

That's all.

So, in Hpricot:

doc = Hpricot("…")
doc.search(“foo”).each do |ele|
new_ele = Hpricot

I found a ’ + ele.name + " tag enclosing ‘" +
ele.inner_html + "’ with " + ele.attributes.keys.join(’ and ‘) +
" values of " + ele.attributes.values.map { |x| "’#{x}’" }.join(’
and ') +
“…


ele.parent.replace_child(ele, new_ele.children.first)
end
puts doc

REXML has a replace_child as well. But now you’ve motivated me to add
Element#replace.

_why

why the lucky stiff wrote:

ele.parent.replace_child(ele, new_ele.children.first)

end
puts doc

REXML has a replace_child as well. But now you’ve motivated me to add Element#replace.

Hmm, the right thing to do and a tasty way to do it.

This motivates me to download Hpricot for the first time
and try it. Probably tomorrow as my brane is fride.

Thanks,
Hal

unknown wrote:

It’s macro-type stuff. Basically I want to output
the same HTML except for the text that replaced
the special tags.

This is what XSLT was designed for and it may provide another option for
you…

ilan

[email protected] wrote:

Contrived example below in case it helps.

input = <<ENDHTML

Hi, there.

some more text

That's all.

ENDHTML

require ‘rexml/document’
doc = REXML::Document.new( input )
doc.root.each_element( ‘//foo’ ){ |e|
new_para = REXML::Element.new( ‘p’ )
new_para.text = “I found a foo tag enclosing ‘#{e.text}’ with bar and
bam values of ‘#{e.attributes[‘bar’]}’ and ‘#{e.attributes[‘bam’]}’…”
e.parent.replace_child( e, new_para )
}
puts doc

#=>
#=>
#=>

Hi, there.


#=>

I found a foo tag enclosing ‘some more text’ with bar
and bam values of ‘this’ and ‘that’…


#=>

That’s all.


#=>
#=>

Ilan B. wrote:

It’s macro-type stuff. Basically I want to output
the same HTML except for the text that replaced
the special tags.

This is what XSLT was designed for and it may provide another option for
you…

That makes sense. I’ve never used XSLT, but I’m sure that’s
a viable solution.

_Why’s Hpricot example worked perfectly for me, BTW.

So, a related question.

Suppose I wanted to “nest” macros of this kind. Something like:

<mac1 foo=“1” bar="2>My name is
seed-value
today.

Forgive the nonsense example.

Could XSLT handle this easily? Could Hpricot (_why)?

Thanks,
Hal

[email protected] wrote:

I can’t find any examples of generating XML with

Output:

Hi, there.

I found a foo tag enclosing 'some more text' with bar and bam values of 'this' and 'that'...

That's all.

require ‘xml-split.rb’

tag = ‘foo’
DATA.read.xml_split(tag).each {|stuff|
if stuff.class == String
print stuff
else
attr = stuff[0].xml_parse
puts “

I found a #{tag} tag enclosing ‘#{stuff[1]}’ with”
print “#{attr.keys.join(’ and ‘)} values of "
print "’#{attr.values.join(”’ and ‘")}’…

"
end
}

END

Hi, there.

some more text

That's all.

---- output ----

Hi, there.

I found a foo tag enclosing 'some more text' with bam and bar values of 'that' and 'this'...

That's all.

unknown wrote:

_Why’s Hpricot example worked perfectly for me, BTW.

So, a related question.

Suppose I wanted to “nest” macros of this kind. Something like:

<mac1 foo=“1” bar="2>My name is
seed-value
today.

Forgive the nonsense example.

Could XSLT handle this easily? Could Hpricot (_why)?

Thanks,
Hal

Yes, both techniques could handle nested elements, I don’t know what XML
tools you are using, but many come with XSLT support built in. XSLT
allows any XML(XHTML) doc to be transformed into any other. At one time
it was slated to replace .CSS but that never seeemed to materialize.
Now days, it’s mostly used in report generation and xml rpc filtering
but it ofcourse has many uses. The disadvantages of XSLT is that it can
be rather challenging to debug and it can grow to be very verbose in non
trivial transformations. The advantage is that it is a W3C standard and
practically every platform/language has support for it in one form or
another.

I have no experience of Hpricot but if you are already using Ruby as
your main processor then I would probably stick with Hpricot as the
solutions above look much cleaner than an XSLT solution :slight_smile: Oh… and
lastly, if you don’t use XSLT/XPath on a regular basis, you can easily
forget it’s symantics and have to keep referring back to the docs or at
least I have to.

ilan