Nokogiri: to_s WITHOUT html surrounding's tags?

josh · October 13, 2009, 11:53pm

Hi all

n = Nokogiri::HTML(“

H1

”)
n.to_s

=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"

"http://www.w3.org/TR/REC-html40/loose.dtd\">\n

H1

Is there a method that only outputs the stuff I’ve read, and not the
whole valid XHTML stuff?

Needed output:

H1

Thanks a lot
Josh

josh · October 14, 2009, 12:01am

Joshua M. wrote:

Hi all

n = Nokogiri::HTML(“

H1
”)
n.to_s

=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"

"http://www.w3.org/TR/REC-html40/loose.dtd\">\n

H1

Is there a method that only outputs the stuff I’ve read, and not the
whole valid XHTML stuff?

Needed output:

H1

If all you need is the original input, then why bother running it
through Nokogiri?

Thanks a lot
Josh

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

josh · October 14, 2009, 12:47am

If all you need is the original input, then why bother running it
through Nokogiri?

Obviously that’s just for presentation purposes… I will apply some
other stuff to the DOM, too…

josh · October 14, 2009, 1:03am

Joshua M. wrote:

If all you need is the original input, then why bother running it
through Nokogiri?

Obviously that’s just for presentation purposes… I will apply some
other stuff to the DOM, too…

OK. So try again with an example that is closer to what you actually
need.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

josh · October 14, 2009, 1:13am

OK. So try again with an example that is closer to what you actually
need.

The example above is exactly what I need. I think my question is quite
simple?

josh · October 14, 2009, 1:35am

On Tue, Oct 13, 2009 at 2:53 PM, Joshua M.
[email protected] wrote:

n = Nokogiri::HTML(“

H1
”)
n.to_s

=> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"

"http://www.w3.org/TR/REC-html40/loose.dtd\">\n

H1

Is there a method that only outputs the stuff I’ve read, and not the
whole valid XHTML stuff?

Well, it’s not XHTML, if you note the doctype but …

Needed output:

H1

n = Nokogiri::HTML(“

H1
”).xpath(‘//h1’).to_xml
=> “
H1
”

I would think that’d be pretty apparent from a glance at the examples
in the rdoc, btw. Just sayin’

–
Hassan S. ------------------------ [email protected]
twitter: @hassan

josh · October 14, 2009, 1:50am

On Tue, Oct 13, 2009 at 2:53 PM, Joshua M. <
[email protected]> wrote:

whole valid XHTML stuff?

Needed output:

H1

Thanks a lot
Josh

You can do the following:

Nokogiri::HTML(“

H1

”).css( ‘h1’ )

Good luck,

-Conrad

josh · October 14, 2009, 10:20am

Thank you, guys. I didn’t know that Nokogiri creates a complete (X)HTML
DOM when reading an incomplete structure.

Have a great day!

josh · October 14, 2009, 1:56am

On Tue, Oct 13, 2009 at 4:49 PM, Conrad T. [email protected]
wrote:

">\n

H1

You can do the following:

Nokogiri::HTML(“

H1
”).css( ‘h1’ )

The example above produces a node of the DOM and adding ‘to_s’ will give
you
the string.

josh · October 14, 2009, 10:42am

Joshua M. wrote:

Thank you, guys. I didn’t know that Nokogiri creates a complete (X)HTML
DOM when reading an incomplete structure.

Have a great day!

Another question. I don’t get Nokogiri to produce XHTML. So for example
I get unclosed tags like

instead of

. I found some options
(http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/ParseOptions.html)
but I don’t really know how to get them to work…