Is there a way to select links in a scraped mechanize page using XPath
selectors ?
For example…all links on the second TABLE on the page.
I know it is possible with hpricot but i need the links to be used by
mechanize.
Is there a way to select links in a scraped mechanize page using XPath
selectors ?
For example…all links on the second TABLE on the page.
I know it is possible with hpricot but i need the links to be used by
mechanize.
On 2008.10.15., at 19:08, Ruby N. wrote:
Is there a way to select links in a scraped mechanize page using XPath
selectors ?For example…all links on the second TABLE on the page.
I know it is possible with hpricot but i need the links to be used by
mechanize.
From the Mechanize guide
(http://mechanize.rubyforge.org/mechanize/files/GUIDE_txt.html
):
Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get(‘http://someurl.com/').search("//p[@class='posted’]")
HTH,
Peter
Peter S. wrote:
On 2008.10.15., at 19:08, Ruby N. wrote:
Is there a way to select links in a scraped mechanize page using XPath
selectors ?For example…all links on the second TABLE on the page.
I know it is possible with hpricot but i need the links to be used by
mechanize.From the Mechanize guide
(http://mechanize.rubyforge.org/mechanize/files/GUIDE_txt.html
):Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get(‘http://someurl.com/').search("//p[@class='posted’]")
HTH,
Peter
Wait a minute, it says the total opposite on the Mechanize page. But it
definately explains why it’s not being friendly with nokogiri…
http://mechanize.rubyforge.org/mechanize/
Mechanize uses nokogiri to parse html. What does this mean for you? You
can treat a mechanize page like an nokogiri object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using nokogiri methods:
agent.get(‘http://someurl.com/').search(".//p[@class='posted’]"
.search("//a")
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs