Forum: Ruby Mechanize and XPath

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ruby N. (Guest)
on 2008-10-15 21:08
Is there a way to select links in a scraped mechanize page using XPath
selectors ?

For example...all links on the second TABLE on the page.


I know it is possible with hpricot but i need the links to be used by
mechanize.
Peter S. (Guest)
on 2008-10-15 21:44
(Received via mailing list)
On 2008.10.15., at 19:08, Ruby N. wrote:

>
> Is there a way to select links in a scraped mechanize page using XPath
> selectors ?
>
> For example...all links on the second TABLE on the page.
>
>
> I know it is possible with hpricot but i need the links to be used by
> mechanize.

 From the Mechanize guide
(http://mechanize.rubyforge.org/mechanize/files/GUI...
):

Mechanize uses hpricot to parse html. What does this mean for you? You
can treat a mechanize page like an hpricot object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using hpricot methods:
agent.get('http://someurl.com/').search("//p[@class='posted']...)
HTH,
Peter
Patrick L. (Guest)
on 2009-02-19 00:32
Peter S. wrote:
> On 2008.10.15., at 19:08, Ruby N. wrote:
>
>>
>> Is there a way to select links in a scraped mechanize page using XPath
>> selectors ?
>>
>> For example...all links on the second TABLE on the page.
>>
>>
>> I know it is possible with hpricot but i need the links to be used by
>> mechanize.
>
>  From the Mechanize guide
> (http://mechanize.rubyforge.org/mechanize/files/GUI...
> ):
>
> Mechanize uses hpricot to parse html. What does this mean for you? You
> can treat a mechanize page like an hpricot object. After you have used
> Mechanize to navigate to the page that you need to scrape, then scrape
> it using hpricot methods:
> agent.get('http://someurl.com/').search("//p[@class='posted']...)
> HTH,
> Peter

Wait a minute, it says the total opposite on the Mechanize page.  But it
definately explains why it's not being friendly with nokogiri...

http://mechanize.rubyforge.org/mechanize/

Mechanize uses nokogiri to parse html. What does this mean for you? You
can treat a mechanize page like an nokogiri object. After you have used
Mechanize to navigate to the page that you need to scrape, then scrape
it using nokogiri methods:

agent.get('http://someurl.com/').search(".//p[@class='posted'...
Vvv V. (Guest)
on 2014-09-23 17:14
.search("//a")
This topic is locked and can not be replied to.