Forum: Ruby Trying to download files using WWW::Mechanize

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Berger, Daniel (Guest)
on 2006-05-30 22:20
(Received via mailing list)
Hi all,

Ruby 1.8.4
www-mechanize 0.4.5

I've got a web page.  On that page are a series of links to .csv files.
I need a way to download a particular csv file.  This file can either be
loaded into memory or onto the local filesystem - either way is fine.

I've gotten this far:

require 'mechanize'
include WWW

mech  = Mechanize.new
agent = mech.get(url)

page.links.each{ |link|
   p link
}

With that, I can see the links to the .csv files, which look like this
on inspection:

#<WWW::Link:0x33945a0 @node=<a href='foo_May_29_2006.csv'> ... </>,
@text="foo_May_29_..>", @href="foo_May_29_2006.csv">
#<WWW::Link:0x3393898 @node=<a href='foo_May_30_2006.csv'> ... </>,
@text="foo_May_30_..>", @href="foo_May_30_2006.csv">

How do I grab a particular file and load it into memory or onto the
local filesystem?  I tried using the 'text' method (based on the
examples file) but that didn't seem to work for me.

Thanks,

Dan


This communication is the property of Qwest and may contain confidential
or
privileged information. Unauthorized use of this communication is
strictly
prohibited and may be unlawful.  If you have received this communication
in error, please immediately notify the sender by reply e-mail and
destroy
all copies of the communication and any attachments.
Aaron P. (Guest)
on 2006-05-30 22:41
(Received via mailing list)
On Wed, May 31, 2006 at 03:16:45AM +0900, Berger, Daniel wrote:
> }
Try something like this:

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
page = agent.get(ARGV[0])

bodies = []
page.links.each { |link|
  puts "Clicking '#{link.text}'"
  bodies << agent.click(link).body
}

p bodies

Or even shorter:

agent = WWW::Mechanize.new

bodies = []
agent.get(ARGV[0]).links.each { |link|
  bodies << agent.click(link).body
}

p bodies

--Aaron
James B. (Guest)
on 2006-05-30 22:48
(Received via mailing list)
Berger, Daniel wrote:

 > ...
> examples file) but that didn't seem to work for me.
>

I'm thinking you need to grab the href value, glom it onto the base URL,
and use that with, say, open-uri, to fetch it.

page.links.each{ |link|
    if link.href =~ /\.csv$/
      full_url = url + link.href
      # Go read that URL  ...
    end
}

--
James B.

"In Ruby, no one cares who your parents were, all they care
  about is if you know  what you are talking about."
   - Logan C.
Aaron P. (Guest)
on 2006-05-30 23:01
(Received via mailing list)
Hey Dan.

On Wed, May 31, 2006 at 03:16:45AM +0900, Berger, Daniel wrote:
> How do I grab a particular file and load it into memory or onto the
> local filesystem?  I tried using the 'text' method (based on the
> examples file) but that didn't seem to work for me.
>

I misread the question the first time, so I'll try again!  The text
method helps you match the text displayed.  For example, a url that
looks like this:

<a href="http://google.com">Hello World!</a>

Would be found like this:

  page.links.text('Hello World!').first

Mechanize returns an array because there could be multiple links that
have that text.  You can also use a regular expression like this:

  page.links.text(/Hello World!/).first

Or, say you need to find all files whose 'href' ends in '.csv', you
could do this:

  page.links.href(/\.csv$/).each { |link|
    puts agent.click(link).body
  }

Hope this helps!

--Aaron
This topic is locked and can not be replied to.