require “nokogiri”
doc = Nokogiri::HTML::Document.new(“ Save the page! ”)
doc.class # => Nokogiri::HTML::Document
doc = Nokogiri::HTML::Document.parse <<-eof
eof
doc.class # => Nokogiri::HTML::Document
doc.meta_encoding # => nil
puts doc.to_html
>> <!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN”
“http://www.w3.org/TR/REC-html40/loose.dtd ”>
>>
>>
>>
>>
>>
>>
>>
Why Nokogiri::HTML::Document#meta_encoding returns nil ?
I think it does not recognize the html5 meta charset, since the
following works :
doc.meta_encoding=“<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">”
puts doc.meta_encoding
Love U Ruby wrote in post #1111697:
require “nokogiri”
doc = Nokogiri::HTML::Document.new(“ Save the page! ”)
doc.class # => Nokogiri::HTML::Document
doc = Nokogiri::HTML::Document.parse <<-eof
eof
doc.class # => Nokogiri::HTML::Document
doc.meta_encoding # => nil
puts doc.to_html
>> <!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN”
“http://www.w3.org/TR/REC-html40/loose.dtd ”>
>>
>>
>>
>>
>>
>>
>>
Why Nokogiri::HTML::Document#meta_encoding returns nil ?
Still I am getting nil
.
doc = Nokogiri::HTML::Document.new(" Save the page! “)
doc.meta_encoding=”<meta http-equiv=“Content-Type”
content=“text/html; charset=utf-8”>"
doc.meta_encoding # => nil
Love U Ruby [email protected] wrote:
eof
I think the problem is that when nokogiri parses html, it assumes html
4.0 transitional, as is evidenced by the DOCTYPE.
I’m not sure how to get it to deal with HTML 5…
Love U Ruby [email protected] wrote:
Still I am getting nil
.
doc = Nokogiri::HTML::Document.new(“ Save the page! ”)
doc.meta_encoding=“<meta http-equiv="Content-Type"
content="text/html; charset=utf-8">”
doc.meta_encoding # => nil
You’re confusing #new with #parse , as well as what the input to
#meta_encoding should be.
irb(main):027:0> doc = Nokogiri::HTML::Document.parse “ Save the
page! ”
#<Nokogiri::HTML::Document:0x4c2aa20 name=“document”
children=[#<Nokogiri::XML::DTD:0x4c35fc4 name=“html”>,
#<Nokogiri::XML::Element:0x4c35a38 name=“html”
children=[#<Nokogiri::XML::Element:0x4c3565a name=“head”
children=[#<Nokogiri::XML::Element:0x4c35470 name=“title”
children=[#<Nokogiri::XML::Text:0x4c35196 " Save the page! ">]>]>]>]>
irb(main):028:0> puts doc
Save the page!
nil
irb(main):029:0> doc.meta_encoding
"UTF-8"
irb(main):030:0> doc.meta_encoding="ISO-8599-2"
"ISO-8599-2"
irb(main):031:0> doc.meta_encoding
"ISO-8599-2"
Also, since you are parsing fragments instead of documents, you really
should be using DocumentFragment instead of Document.
irb(main):032:0> docf = Nokogiri::HTML::DocumentFragment.parse “
Save the Page! ”
#<Nokogiri::HTML::DocumentFragment:0x4d30514 name=“#document-fragment ”
children=[#<Nokogiri::XML::Element:0x4d303ca name=“title”
children=[#<Nokogiri::XML::Text:0x4d300a0 " Save the Page! ">]>]>
irb(main):033:0> puts docf
Save the Page!
nil
irb(main):034:0> docf.respond_to?(:meta_encoding)
false
Since the encoding only makes sense when you assemble the entire
document to send it out to the browser, fragments don’t care.
What remains is still how to get Nokogiri to recognize and emit HTML5.
Tamara T. wrote in post #1111724:
Love U Ruby [email protected] wrote:
eof
You realized my exact pain area. I always confused about the use of the
below two methods:
**Nokogiri::HTML::Document.new
**Nokogiri::HTML::Document.parse
==================================
require “nokogiri”
require ‘pp’
doc = Nokogiri::HTML::Document.parse “ Save the page! ”
doc.class # => Nokogiri::HTML::Document
doc
=> #(Document:0x4592d16 {
name = “document”,
children = [
#(DTD:0x4592244 { name = “html” }),
#(Element:0x458dd7a {
name = “html”,
children = [
#(Element:0x45871d2 {
name = “head”,
children = [
#(Element:0x458161a {
name = “title”,
children = [ #(Text " Save the page! ")]
})]
})]
})]
})
doc = Nokogiri::HTML::Document.new(“ Save the page! ”)
doc.class # => Nokogiri::HTML::Document
doc
=> #(Document:0x4578128 {
name = “document”,
children = [ #(DTD:0x45714fe { name = “html” })]
})
Both the method creating Nokogiri::HTML::Document
object. But when I
am printing those,seeing the output differently. Now my questions are -
(a) why does Nokogiri::HTML::Document.parse
and
Nokogiri::HTML::Document.new
creating the document object differently?
(b) What is the proper use-case about their uses mean when should I need
to think/what method to use?
Please help me to digest this basic food.
Thanks
Tamara T. wrote in post #1111725:
Love U Ruby [email protected] wrote:
irb(main):027:0> doc = Nokogiri::HTML::Document.parse “ Save the
page! ”
#<Nokogiri::HTML::Document:0x4c2aa20 name=“document”
children=[#<Nokogiri::XML::DTD:0x4c35fc4 name=“html”>,
#<Nokogiri::XML::Element:0x4c35a38 name=“html”
children=[#<Nokogiri::XML::Element:0x4c3565a name=“head”
children=[#<Nokogiri::XML::Element:0x4c35470 name=“title”
children=[#<Nokogiri::XML::Text:0x4c35196 " Save the page! ">]>]>]>]>
irb(main):028:0> puts doc
Save the page!
nil
irb(main):029:0> doc.meta_encoding
"UTF-8"
irb(main):030:0> doc.meta_encoding="ISO-8599-2"
"ISO-8599-2"
irb(main):031:0> doc.meta_encoding
"ISO-8599-2"
For me why things not working,I don’t know :
[1] pry(main)> require “nokogiri”
=> true
[2] pry(main)> doc = Nokogiri::HTML::Document.parse “ Save the
page! ”
=> #(Document:0x46ca5da {
name = “document”,
children = [
#(DTD:0x46beef6 { name = “html” }),
#(Element:0x46be1c2 {
name = “html”,
children = [
#(Element:0x46b5158 {
name = “head”,
children = [
#(Element:0x46b4974 {
name = “title”,
children = [ #(Text " Save the page! ")]
})]
})]
})]
})
[5] pry(main)> doc.meta_encoding
=> nil
[6] pry(main)> doc.meta_encoding=“ISO-8599-2”
=> “ISO-8599-2”
[7] pry(main)> doc.meta_encoding
=> nil
[8] pry(main)>
Love U Ruby [email protected] wrote:
For me why things not working,I don’t know :
Me, neither. What version of nokogiri and ruby are you using?
Even at that, I haven’t a clue.
Love U Ruby [email protected] wrote:
You realized my exact pain area. I always confused about the use of the
below two methods:
**Nokogiri::HTML::Document.new
Use when creating a new (i.e. NON-EXISTANT) HTML document.
**Nokogiri::HTML::Document.parse
Use when parsing an existing COMPLETE HTML document, but NOT a fragment.
Use Nokogiri::HTML::DocumentFragment.parse when parsing a fragment
string (i.e., INCOMPLETE DOCUMENT).
name = “document”,
children = [ #(Text " Save the page! ")]
children = [ #(DTD:0x45714fe { name = “html” })]
})
Both the document creating Nokogiri::HTML::Document
object. But when I
am printing those,seeing the output differently. Now my questions are -
(a) why does Nokogiri::HTML::Document.parse
and
Nokogiri::HTML::Document.new
creating the document object differently?
As stated above, these do two different things, although you may end up
with the same class of object, HOW they go about getting there is
different.
(b) What is the proper use-case about their uses mean when should I need
to think/what method to use?
See above.
Please help me to digest this basic food.
Pre-chewed and partially digested.
Tamara T. wrote in post #1111756:
Me, neither. What version of nokogiri and ruby are you using?
Even at that, I haven’t a clue.
Here is my Nokogiri version:
require ‘nokogiri’
Nokogiri::VERSION_INFO
=> {“warnings”=>[],
“nokogiri”=>“1.6.0.rc1”,
“ruby”=>
{“version”=>“2.0.0”,
“platform”=>“i686-linux”,
“description”=>"ruby 2.0.0p0 (2013-02-24 revision 39474)
[i686-linux]",
“engine”=>“ruby”},
“libxml”=>
{“binding”=>“extension”,
“source”=>“packaged”,
“libxml2_path”=>
“/home/kirti/.rvm/gems/ruby-2.0.0-p0/gems/nokogiri-1.6.0.rc1/ports/i686-linux-gnu/libxml2/2.8.0”,
“libxslt_path”=>
“/home/kirti/.rvm/gems/ruby-2.0.0-p0/gems/nokogiri-1.6.0.rc1/ports/i686-linux-gnu/libxslt/1.1.26”,
“compiled”=>“2.8.0”,
“loaded”=>“2.8.0”}}
Hummm,raised as a issue : see here
opened 12:05PM - 08 Jun 13 UTC
closed 01:06PM - 14 Jun 13 UTC
```
require "nokogiri"
doc = Nokogiri::HTML::Document.new("<title> Save the pag… e! </title>")
doc.class # => Nokogiri::HTML::Document
doc = Nokogiri::HTML::Document.parse <<-eof
<head>
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML,CSS,XML,JavaScript">
<meta name="author" content="Ståle Refsnes">
<meta charset="UTF-8">
</head>
eof
doc.class # => Nokogiri::HTML::Document
doc.meta_encoding # => nil
puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><head>
# >> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
# >> <meta name="description" content="Free Web tutorials">
# >> <meta name="keywords" content="HTML,CSS,XML,JavaScript">
# >> <meta name="author" content="Ståle Refsnes">
# >> <meta charset="UTF-8">
# >> </head></html>
```
Can you give me one example for the method Nokogiri::HTML::Document::parse(string_or_io, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML)
where
url
is used? I don’t understand the meaning of the url
and options
as a parameter. So looking for an example where those are used.
Please advise.
Thanks for all your help !
Love U Ruby [email protected] wrote:
Can you give me one example for the method parse(string_or_io, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML)
where
url
is used? I don’t understand the meaning of the url
and options
as a parameter. So looking for an example where those are used.
Nope, never used that.