Scraping with Nokogiri while using Mechanize

luislavena · March 10, 2011, 7:28pm

The Mechanize documentation says to just start scraping with Nokogiri
once you’ve navigated to the right page with Mechanize, but this example
doesn’t seem to work for me:

agent = Mechanize.new
page = agent.get(‘http://google.com/’)
google_form = page.form(‘f’)
google_form.q = ‘ruby mechanize’
page = agent.submit(google_form)

page.xpath(‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

But this works:

page =
Nokogiri::HTML(open(‘ruby mechanize - Google Search’))
page.xpath(‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

Any advice?

squawkboxed · March 11, 2011, 4:48am

Is there a form with the name ‘f’ on the page www.google.com? According
to the mechanize instructions, you were supposed to do this:

pp page

to identify the name of the form and the name of the form field you
want to fill in.

squawkboxed · March 11, 2011, 5:22pm

The Mechanize part works fine. I navigate to the page I want and when I
pretty print I get the page I want. The problem is I don’t know how to
scrape data from the page with Nokogiri once I’ve navigated to it. For
example, this is how you would do it with pure Nokogiri:

page =
Nokogiri::HTML(open(‘ruby mechanize - Google Search’))
page.xpath(‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

But when I do it with Mechanize, it doesn’t output anything.

agent = Mechanize.new
page = agent.get(‘http://google.com/’)
google_form = page.form(‘f’)
google_form.q = ‘ruby mechanize’
page = agent.submit(google_form)

page.xpath(‘//h3/a[@class=“l”]’).each do |link|
puts link.content
end

It’s a trivial example, but for the data I’m scraping I need to use
Mechanize to navigate to the page, so I can’t just use Nokogiri. The
thing is the Mechanize documentation says, I should be able to use
regular Nokogiri methods, but they don’t seem to work for me.