Get content in a xml element using hpricot

Bonita · April 13, 2007, 9:48am

Hi

I’m using hpricot to parse the following file.

[from morwyn] * HTML for the Conceptually Challenged http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn HTML for the Conceptually Challenged. Very basic tutorial, plainly worded for people who hate to read instructions. morwyn 2006-10-10T07:28:28Z html imported webpagedesign

I’m trying to get the content from dc:subject like this

doc = Hpricot.parse(File.read(“965.xhtml”))

(doc/“item”).each do |t|

puts (t/“dc:subject”).innerTEXT

end

but I got

dc:subjecthtml internet tutorial web</dc:subject>

while I only need “html internet tutorial web”

Anyone knows what’s the right function to call?

THanks

Bonita · April 13, 2007, 12:45pm

On Apr 13, 9:48 am, Bonita [email protected] wrote:

dc:creatormorwyn</dc:creator>

but I got
Posted viahttp://www.ruby-forum.com/.

Bonita · April 13, 2007, 12:50pm

Sorry for deleted your text

Maybe you can try:

puts (t/“dc:subject”).text

Bonita wrote:

I’m trying to get the content from dc:subject like this

doc = Hpricot.parse(File.read(“965.xhtml”))

(doc/“item”).each do |t|

puts (t/“dc:subject”).innerTEXT

end

but I got

dc:subjecthtml internet tutorial web</dc:subject>

while I only need “html internet tutorial web”

Anyone knows what’s the right function to call?

THanks

Bonita · April 13, 2007, 12:50pm

On Apr 13, 12:40 pm, [email protected] wrote:

HTML for the Conceptually Challenged. Very basic tutorial,
</taxo:topics>
end

–
Posted viahttp://www.ruby-forum.com/.

puts (t/‘dc:subject’).text

puts (t/‘dc:subject’).text

Sorry for the double post but I shouldn’t have copy/paste the result
directly from irb