Trying to download files using WWW::Mechanize


#1

Hi all,

Ruby 1.8.4
www-mechanize 0.4.5

I’ve got a web page. On that page are a series of links to .csv files.
I need a way to download a particular csv file. This file can either be
loaded into memory or onto the local filesystem - either way is fine.

I’ve gotten this far:

require ‘mechanize’
include WWW

mech = Mechanize.new
agent = mech.get(url)

page.links.each{ |link|
p link
}

With that, I can see the links to the .csv files, which look like this
on inspection:

#<WWW::link:0x33945a0 @node= … </>,
@text=“foo_May_29_…>”, @href=“foo_May_29_2006.csv”>
#<WWW::link:0x3393898 @node=
… </>,
@text=“foo_May_30_…>”, @href=“foo_May_30_2006.csv”>

How do I grab a particular file and load it into memory or onto the
local filesystem? I tried using the ‘text’ method (based on the
examples file) but that didn’t seem to work for me.

Thanks,

Dan

This communication is the property of Qwest and may contain confidential
or
privileged information. Unauthorized use of this communication is
strictly
prohibited and may be unlawful. If you have received this communication
in error, please immediately notify the sender by reply e-mail and
destroy
all copies of the communication and any attachments.


#2

On Wed, May 31, 2006 at 03:16:45AM +0900, Berger, Daniel wrote:

}
Try something like this:

require ‘rubygems’
require ‘mechanize’

agent = WWW::Mechanize.new
page = agent.get(ARGV[0])

bodies = []
page.links.each { |link|
puts “Clicking ‘#{link.text}’”
bodies << agent.click(link).body
}

p bodies

Or even shorter:

agent = WWW::Mechanize.new

bodies = []
agent.get(ARGV[0]).links.each { |link|
bodies << agent.click(link).body
}

p bodies

–Aaron


#3

Berger, Daniel wrote:


examples file) but that didn’t seem to work for me.

I’m thinking you need to grab the href value, glom it onto the base URL,
and use that with, say, open-uri, to fetch it.

page.links.each{ |link|
if link.href =~ /.csv$/
full_url = url + link.href
# Go read that URL …
end
}


James B.

“In Ruby, no one cares who your parents were, all they care
about is if you know what you are talking about.”

  • Logan C.

#4

Hey Dan.

On Wed, May 31, 2006 at 03:16:45AM +0900, Berger, Daniel wrote:

How do I grab a particular file and load it into memory or onto the
local filesystem? I tried using the ‘text’ method (based on the
examples file) but that didn’t seem to work for me.

I misread the question the first time, so I’ll try again! The text
method helps you match the text displayed. For example, a url that
looks like this:

Hello World!

Would be found like this:

page.links.text(‘Hello World!’).first

Mechanize returns an array because there could be multiple links that
have that text. You can also use a regular expression like this:

page.links.text(/Hello World!/).first

Or, say you need to find all files whose ‘href’ ends in ‘.csv’, you
could do this:

page.links.href(/.csv$/).each { |link|
puts agent.click(link).body
}

Hope this helps!

–Aaron