ruby 1.9.3
nokogiri 1.5.5
Say, a web page has a link,
reference
I would like to get both the url and text, “http://example.com” and
“reference”.
First, access to the page that contains this link.
doc = Nokogiri::HTML(open(url))
then,
name = doc.xpath(‘//div…/a’).text
url = doc.xpath('//div…/a/@href).text
It works. But the problem is this is parsing twice separately.
If you want to apply the same procedure to many links that exist in a
single page, it seems inefficient.
Is there anyway to produce both url and text by single parse? like
def parse_link_and_text (xpath)
…
end
p parse_link_and_text(‘//div…’)
gives a hash
=> {‘reference’ => ‘http://example.com’}
?
On Sun, May 12, 2013 at 1:37 AM, Soichi I. [email protected]
wrote:
First, access to the page that contains this link.
single page, it seems inefficient.
=> {‘reference’ => ‘http://example.com’}
?
Just search for and go from there.
$ irb -r nokogiri
irb(main):001:0> dom = Nokogiri.HTML(‘text’)
=> #<Nokogiri::HTML::Document:0x434197c name=“document”
children=[#<Nokogiri::XML::DTD:0x43411d4 name=“html”>,
#<Nokogiri::XML::Element:0x433df20 name=“html”
children=[#<Nokogiri::XML::Element:0x433daac name=“body”
children=[#<Nokogiri::XML::Element:0x433d48a name=“x”
children=[#<Nokogiri::XML::Element:0x433cfee name=“a”
attributes=[#<Nokogiri::XML::Attr:0x433b086 name=“href” value=“link”>]
children=[#<Nokogiri::XML::Text:0x433be5a “text”>]>]>]>]>]>
irb(main):002:0> node = dom.at_xpath ‘//a’
=> #<Nokogiri::XML::Element:0x433cfee name=“a”
attributes=[#<Nokogiri::XML::Attr:0x433b086 name=“href” value=“link”>]
children=[#<Nokogiri::XML::Text:0x433be5a “text”>]>
irb(main):003:0> node[:href]
=> “link”
irb(main):004:0> node.text
=> “text”
irb(main):005:0>
Now, what is so difficult about that? You can easily find out more via
documentation.
Cheers
robert
Mayby using a temp variable ?
links = doc.xpath('//div/a[@href]')
links.map do |x| [x.text,x['href']] end => [["reference", "
http://example.com"]]
2013/5/12 Soichi I. [email protected]
Thanks both replies are helpful!