Hpricot innerTEXT?

Hi

I’m using hpricot to parse the following file.

[from morwyn] * HTML for the Conceptually Challenged http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn HTML for the Conceptually Challenged. Very basic tutorial, plainly worded for people who hate to read instructions. morwyn 2006-10-10T07:28:28Z html imported webpagedesign

I’m trying to get the content from dc:subject like this

doc = Hpricot.parse(File.read(“965.xhtml”))

(doc/“item”).each do |t|

puts (t/“dc:subject”).innerTEXT

end

but I got

dc:subjecthtml internet tutorial web</dc:subject>

while I only need “html internet tutorial web”

Anyone knows what’s the right function to call?

THanks

On Apr 13, 10:11 am, Bontina C. [email protected] wrote:

dc:creatormorwyn</dc:creator>

but I got
Posted viahttp://www.ruby-forum.com/.
replace innerTEXT by inner_html:

(doc/“item”).each do |t|
puts (t/“dc:subject”).inner_html
end

regards
Lionel

Lionel Orry wrote:

On Apr 13, 10:11 am, Bontina C. [email protected] wrote:

dc:creatormorwyn</dc:creator>

but I got
Posted viahttp://www.ruby-forum.com/.
replace innerTEXT by inner_html:

(doc/“item”).each do |t|
puts (t/“dc:subject”).inner_html
end

regards
Lionel

Thx for your response , but I still get
dc:subjecthtml internet tutorial web</dc:subject>

On Fri, Apr 13, 2007 at 08:45:08PM +0900, chickenkiller wrote:

puts (t/“dc:subject”).inner_html

In fact, inner_text works as well. But you should have a look at the
warnings from ruby! The inner_text or inner_html function is applied
to ‘puts (t/“dc:subject”)’ return object, which is nil.
So a warning appears:
rdf.rb:6: undefined method `inner_html’ for nil:NilClass
(NoMethodError)

That’s not a warning, that’s an exception, and the program will
terminate at
that point. The OP didn’t mention any errors.

but ‘puts (t/“dc:subject”)’ is executed, and so ‘dc:subjecthtml
internet tutorial web</dc:subject>’ is displayed anyway. Therefore I
recommend using a few parentheses there:

puts((t/“dc:subject”).inner_text)

and it should work well this time.

Next time, look at the warnings!!! :wink:

Good point, but it was OK the way he wrote it, with a space after puts.

irb(main):003:0> p (1+3).to_s
“4”
=> nil
irb(main):004:0> p(1+3).to_s
4
=> “”

In the first case, this is p( (1+3).to_s )

In the second case, this is ( p(1+3) ).to_s # i.e. nil.to_s

On Apr 13, 12:10 pm, Bontina C. [email protected] wrote:

end

regards
Lionel

Thx for your response , but I still get
dc:subjecthtml internet tutorial web</dc:subject>


Posted viahttp://www.ruby-forum.com/.

In fact, inner_text works as well. But you should have a look at the
warnings from ruby! The inner_text or inner_html function is applied
to ‘puts (t/“dc:subject”)’ return object, which is nil.
So a warning appears:
rdf.rb:6: undefined method `inner_html’ for nil:NilClass
(NoMethodError)

but ‘puts (t/“dc:subject”)’ is executed, and so ‘dc:subjecthtml
internet tutorial web</dc:subject>’ is displayed anyway. Therefore I
recommend using a few parentheses there:

puts((t/“dc:subject”).inner_text)

and it should work well this time.

Next time, look at the warnings!!! :wink:

regards
Lionel

On Apr 13, 1:53 pm, Brian C. [email protected] wrote:

(doc/“item”).each do |t|
Posted viahttp://www.ruby-forum.com/.

In fact, inner_text works as well. But you should have a look at the
warnings from ruby! The inner_text or inner_html function is applied
to ‘puts (t/“dc:subject”)’ return object, which is nil.
So a warning appears:
rdf.rb:6: undefined method `inner_html’ for nil:NilClass
(NoMethodError)

That’s not a warning, that’s an exception, and the program will terminate at
that point. The OP didn’t mention any errors.

Indeed I use the term ‘warning’ VERY abusively - I apologize for this.
This is an exception and nothing else.

In the second case, this is ( p(1+3) ).to_s # i.e. nil.to_s

mmmh… interesting… It seems that the problem arises when in a
block:

output text in comments…

require ‘hpricot’

doc = Hpricot(File.open(“rdf.xhtml”))

puts (doc/“item”/“dc:subject”).inner_text

html imported webpagedesign

(doc/“item”).each do |t|
puts((t/“dc:subject”).inner_text)
end

html imported webpagedesign

(doc/“item”).each do |t|
puts (t/“dc:subject”).inner_text
end

dc:subjecthtml imported webpagedesign</dc:subject>

rdf.rb:12: warning: don’t put space before argument parentheses

rdf.rb:12: undefined method `inner_text’ for nil:NilClass

(NoMethodError)

from rdf.rb:11:in `each’

from rdf.rb:11

I am wondering where the difference is between the two last blocks.
Any ideas?

Lionel

On Fri, Apr 13, 2007 at 10:40:05PM +0900, chickenkiller wrote:

(doc/“item”).each do |t|
Any ideas?
Hmm, looks like this should be something that can be replicated without
hpricot.

$ cat x.rb
x = 3
puts (x-5).abs

1.times do
puts (x-5).abs
end
$ ruby -v
ruby 1.8.4 (2005-12-24) [i486-linux]
$ ruby x.rb
x.rb:5: warning: don’t put space before argument parentheses
2
-2
x.rb:5: undefined method `abs’ for nil:NilClass (NoMethodError)
from x.rb:4
$

Congratulations, I think you’ve found a bug in the parser :slight_smile: I’ll post
this
example to ruby-core.

Regards,

Brian.

On Apr 13, 3:48 pm, Brian C. [email protected] wrote:

I am wondering where the difference is between the two last blocks.
puts (x-5).abs

Congratulations, I think you’ve found a bug in the parser :slight_smile: I’ll post this
example to ruby-core.

Regards,

Brian.

Thanks for your help. I have the same output with this version:

ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-mswin32]

regards,
Lionel

On Apr 13, 2007, at 10:48 PM, Brian C. wrote:

I am wondering where the difference is between the two last blocks.
1.times do
$

Congratulations, I think you’ve found a bug in the parser :slight_smile: I’ll
post this
example to ruby-core.

Regards,

Brian.

Inside the do-end or {} block, use this:
puts((x - 5).abs)
It is more explicit, but correct and works.

so,

(doc/“item”).each do |t|
puts (t/“dc:subject”).inner_html
end

will work as
(doc/“item”).each do |t|
puts((t/“dc:subject”).inner_html
end

I prefer this version for the initial problem:

irb(main):045:0> elements = doc.search(‘dc:subject/text()’)
=> #<Hpricot::Elements[“html imported webpagedesign”]>

irb(main):048:0> elements.first.to_s
=> “html imported webpagedesign”
irb(main):049:0> elements.first.parent
=> {elem dc:subject “html imported webpagedesign” </dc:subject>}