Here is the documentation: File: README — Documentation for nokogiri (1.15.3)
Why does below code not printing the full text?
Code:
require ‘nokogiri’
html = <<-END
<head>
<title> A Dirge </title>
<link rel = "schema.DC"
href = "http://purl.org/DC/elements/1.0/">
<meta name = "DC.Title"
content = "A Dirge">
<meta name = "DC.Creator"
content = "Shelley, Percy Bysshe">
<meta name = "DC.Type"
content = "poem">
<meta name = "DC.Date"
content = "1820">
<meta name = "DC.Format"
content = "text/html">
<meta name = "DC.Language"
content = "en">
</head>
<body><pre>
Rough wind, that moanest loud
Grief too sad for song;
Wild wind, when sullen cloud
Knells all the night long;
Sad storm, whose tears are vain,
Bare woods, whose branches strain,
Deep caves and dreary main, -
Wail, for the world's wrong!
</pre></body>
</html>
END
doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.children.each do |ch|
p ch.content if ch.text?
end
Output:
"\n\n \n\n "
"\n\n "
Expected output:
Rough wind, that moanest loud
Grief too sad for song;
Wild wind, when sullen cloud
Knells all the night long;
Sad storm, whose tears are vain,
Bare woods, whose branches strain,
Deep caves and dreary main, -
Wail, for the world's wrong!
On Sun, Apr 14, 2013 at 11:19 AM, Love U Ruby [email protected]
wrote:
content = "text/html">
Wild wind, when sullen cloud
Wail, for the world's wrong!
–
Posted via http://www.ruby-forum.com/.
If you actually look at the structure of doc, the next to last entry
in it’s children contains children as well, which you need to loop
through. Try this:
(load your code into irb)
require ‘pp’
pp doc
and see what the structure is.
On Sun, Apr 14, 2013 at 11:59 AM, tamouse mailing lists
[email protected] wrote:
html = <<-END
content = “A Dirge”>
<meta name = “DC.Format”
Grief too sad for song;
END
"\n\n "
Deep caves and dreary main, -
(load your code into irb)
require ‘pp’
pp doc
and see what the structure is.
Follow-up: since you have a complete html document, why treat it as a
fragment? You can call Nokogiri::HTML.parse(html) instead and get the
actual complete document tree with all the proper nesting.
tamouse mailing lists wrote in post #1105602:
On Sun, Apr 14, 2013 at 11:59 AM, tamouse mailing lists
[email protected] wrote:
Follow-up: since you have a complete html document, why treat it as a
fragment? You can call Nokogiri::HTML.parse(html) instead and get the
actual complete document tree with all the proper nesting.
I am just learning this Nokogiri
first time. So don’t have that much
knowledge about their uses.
Could you tell me please?
When should I use Nokogiri::HTML.parse(html)
, and the when the other?
On Sun, Apr 14, 2013 at 12:07 PM, Love U Ruby [email protected]
wrote:
Could you tell me please?
No. I will tell you this though. You have entirely the wrong
strategy for learning how to be a developer. You have adopted the
strategy of “someone must tell me”. You need to adopt the strategy of
“try things out until I learn what works”. If you get stuck on this
low a level of understanding, you will never progress, and as you have
seen, it just frustrates people whom you continuously run back to with
every single step. You may think you are learning, but you are not at
all learning how to learn, which is the more important step. You are
not learning how to solve problems, especially your own. People are
NOT on this list to teach you. We are not your instructors. We answer
questions out of the goodness of our hearts, but repeated trips to the
well for every sip wears everyone here down. Frankly, it makes me want
to part this list and go elsewhere. It makes it very unenjoyable, and
very unpleasant.
When should I use Nokogiri::HTML.parse(html)
, and the when the other?
Please compare and contrast the terms “Document” and “Document Fragment”
–
Posted via http://www.ruby-forum.com/.
What do the words “Document” and “Document Fragment” mean to you?
tamouse mailing lists wrote in post #1105601:
On Sun, Apr 14, 2013 at 11:19 AM, Love U Ruby [email protected]
(load your code into irb)
require ‘pp’
pp doc
and see what the structure is.
Now, I tried
doc = Nokogiri::HTML::DocumentFragment.parse(html)
pp doc
doc.children.each do |ch|
p ch.content if ch.text?
end
output:
children = [
#(Text "\n\n Rough wind, that moanest loud\n
Grief too sad for song;\n Wild wind, when sullen cloud\n
Knells all the night long;\n Sad storm, whose tears are
vain,\n Bare woods, whose branches strain,\n Deep
caves and dreary main, -\n Wail, for the world’s wrong!\n\n
")]
}),
#(Text "\n\n ")
–
"\n\n \n\n "
"\n\n "
where does go the middle characters between the first "\n\n \n\n "
?
I see.
You did not actually read what pp doc
told you, did you?
tamouse mailing lists wrote in post #1105606:
I see.
You did not actually read what pp doc
told you, did you?
I have given the partial output that I got from pp
here.
You copying it in to a message and you reading it are two entirely
different things.
Finally I got the output what I was looking for:
require 'nokogiri'
require 'pp'
html = <<-END
<html>
<head>
<title> A Dirge </title>
<link rel = "schema.DC"
href = "http://purl.org/DC/elements/1.0/">
<meta name = "DC.Title"
content = "A Dirge">
<meta name = "DC.Creator"
content = "Shelley, Percy Bysshe">
<meta name = "DC.Type"
content = "poem">
<meta name = "DC.Date"
content = "1820">
<meta name = "DC.Format"
content = "text/html">
<meta name = "DC.Language"
content = "en">
</head>
<body><pre>
Rough wind, that moanest loud
Grief too sad for song;
Wild wind, when sullen cloud
Knells all the night long;
Sad storm, whose tears are vain,
Bare woods, whose branches strain,
Deep caves and dreary main, -
Wail, for the world's wrong!
</pre></body>
</html>
END
doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.children.each do |ch|
puts ch.child.content if ch.node_name == 'pre'
end
output:
Rough wind, that moanest loud
Grief too sad for song;
Wild wind, when sullen cloud
Knells all the night long;
Sad storm, whose tears are vain,
Bare woods, whose branches strain,
Deep caves and dreary main, -
Wail, for the world's wrong!
On Sun, 14 Apr 2013 18:19:00 +0200, Love U Ruby [email protected]
wrote:
doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.children.each do |ch|
p ch.content if ch.text?
end
ch.text?
will only return true when a node is a text node - ie., it’s
not a tag. Since the document root contains no text itself apart from
whitespace, this just prints the whitespace. Remove the if ch.text?
part to print contents of everything (or just use doc.content
).
tamouse mailing lists wrote in post #1105606:
I see.
You did not actually read what pp doc
told you, did you?
Thanks to you for the hints pp doc.It helped me great. Just one more thing to tell you. Can you suggest in what other ways I could solve the same problem? I just want to learn
Nokogiri`. Give me only hints, I
will try to solve using that the same assignment as above.
Bartosz Dziewoński wrote in post #1105615:
On Sun, 14 Apr 2013 18:19:00 +0200, Love U Ruby [email protected]
wrote:
doc = Nokogiri::HTML::DocumentFragment.parse(html)
doc.children.each do |ch|
p ch.content if ch.text?
end
ch.text?
will only return true when a node is a text node - ie., it’s
not a tag. Since the document root contains no text itself apart from
whitespace, this just prints the whitespace. Remove the if ch.text?
part to print contents of everything (or just use doc.content
).
Thank you very much for your comments.
On Mon, Apr 15, 2013 at 1:09 AM, Love U Ruby [email protected]
wrote:
tamouse mailing lists wrote in post #1105606:
I see.
You did not actually read what pp doc
told you, did you?
Thanks to you for the hints pp doc.It helped me great. Just one more thing to tell you. Can you suggest in what other ways I could solve the same problem? I just want to learn
Nokogiri`. Give me only hints, I
will try to solve using that the same assignment as above.
Write your own Mechanize gem.
Just looking for a definition of the use: When should I need to think
of what to use from below ?
Nokogiri::HTML::Document and Nokogiri::HTML::DocumentFragment
and when I should think to use parse
method of each?
On Apr 20, 2013 2:28 AM, “Love U Ruby” [email protected] wrote:
Posted via http://www.ruby-forum.com/.
What do you suppose the meaning of “fragment” is, and why would that
make a
distinction?
tamouse mailing lists wrote in post #1106373:
On Apr 20, 2013 2:28 AM, “Love U Ruby” [email protected] wrote:
Posted via http://www.ruby-forum.com/.
What do you suppose the meaning of “fragment” is, and why would that
make a
distinction?
I understand,but looking for what would be perfect use-case to select
the best one. means when I must think that I have to use
Nokogiri::HTML::Document
and when the other?
On Apr 20, 2013 4:06 AM, “Love U Ruby” [email protected] wrote:
I understand,but looking for what would be perfect use-case to select
the best one. means when I must think that I have to use
Nokogiri::HTML::Document
and when the other?
–
Posted via http://www.ruby-forum.com/.
If you understand the difference, then you have your ‘perfect’ use-case.
Hi,
I wrote the below code:
require ‘nokogiri’
require ‘open-uri’
doc = Nokogiri::HTML(open(‘http://www.homeshop18.com/’))
p doc.css(“div#megamenu-sub-nav li span:nth-child(2)”).map{|x|
x.parent.text.strip}
#=> [“books”, “clothing”, “footwear”, “fashion accessories”, “health &
beauty”, “jewellery”, “watches”, “mobiles”, “gsm mobiles\r\rnew”, “upto
62% off\r\rnew”, “camera & camcorders”, “computers”, “electronics”,
“home & kitchen”, “household appliances”, “kids & toys”, “gift &
flowers”, “office & stationery”]
But in the array output, I am getting 2 extra items - “gsm
mobiles\r\rnew”, “upto 62% off\r\rnew”, which I don’t expect.
Could anyone tell me where I did the mistake.