How to automate download pdf from web in ruby

Hi,

I want to download pdf file from website. But i cannot use rails
“send_file” as i want it in ruby only.

e.g : http://www.example.com/abc.pdf

Now from above URL I just want to automate download “abc.pdf” rather
than

click on save.

I am looking for some good solutions and suggestions.

Thanks,
Priyank S.

e.g : http://www.example.com/abc.pdf

Now from above URL I just want to automate download “abc.pdf” rather
than

click on save.

I am looking for some good solutions and suggestions.

On a Linux box you can happily use wget. Here is a simple use, and
throw script that I used to download video lectures (of Design of
Machine Elements :)) from NPTEL[1]. I hope this gives you an idea.

require ‘rubygems’
require ‘nokogiri’
require ‘open-uri’

proxy = ENV[“http_proxy”]
page = “http://nptel.iitm.ac.in/video.php?courseId=1063

get the title of all the lectures to use as a filename later

doc = Nokogiri open(page, :proxy => proxy)
titles = doc.search(“td.videolink a”).map(&:text)

download url follows a common pattern

construct the url, and download it

(1…9).each do |i|
url = “http://npteldownloads.iitm.ac.in/flv/1063/lec0#{i}.flv
lecture = titles[i - 1]
puts lecture
%x|wget -c ‘#{url}’ -O ‘#{lecture}’|
end

(10…40).each do |i|
url = “http://npteldownloads.iitm.ac.in/flv/1063/lec#{i}.flv
lecture = titles[i - 1]
puts lecture
%x|wget -c ‘#{url}’ -O ‘#{lecture}’|
end

[1] http://nptel.iitm.ac.in/

wasn’t a learning curve), but I will be experimenting with Net::HTTP,
out of curiousity and to get downloads straight into Ruby (I hope)
instead of via a file downloaded by wget.

See if the download, and unzip functions here[1] helps. My fork[2] has
proxy support. I have send a pull request too, but the developer seems
to be on a leave.

It uses ‘progressbar’ gem to show progress, but you can safely omit it
and make a dependency free version :).

[1]
https://github.com/maccman/bowline/blob/master/lib/bowline/tasks/libs.rake
[2]
https://github.com/yeban/bowline/blob/proxy_support/lib/bowline/tasks/libs.rake

You can do it with CGI:

#!/usr/bin/ruby

puts “Content-Type: application/x-unknown\n\n”
puts “Content-Length: 123456\n\n”
puts “Content-Disposition: attachment; filename=abc.pdf\n\n”

output the binary file data here…

I want to download pdf file from website. But i cannot use rails
“send_file” as i want it in ruby only.

e.g : http://www.example.com/abc.pdf

Now from above URL I just want to automate download “abc.pdf” rather
than

The simplest pure-ruby way to do it that I know of is to use the rio
library:

I just ran this simple program and it downloaded the pdf perfectly:

require ‘rubygems’
require ‘rio’
rio(“http://www.sqlite.org/copyright-release.pdf”) >
rio(‘sqlite-copyright-release.pdf’)

-Michael

On Wed, Feb 23, 2011 at 2:46 PM, Anurag P.
[email protected] wrote:

e.g : http://www.example.com/abc.pdf
Now from above URL I just want to automate download “abc.pdf” rather
than click on save.
I am looking for some good solutions and suggestions.

On a Linux box you can happily use wget. Here is a simple use, and
throw script that I used to download video lectures (of Design of
Machine Elements :)) from NPTEL[1]. I hope this gives you an idea.

wget also seems to work well on Microsoft Windows systems, both stand
alone and run as a process from Ruby. (I use the system command rather
than %x, but that’s because I feel more comfortable using system.)

Specifically on running wget from Ruby, I did this quite a lot about
three years ago on a dial-up connection using Microsoft Windows XP,
and much more recently I’ve been doing it on a broadband connection
using Microsoft Windows Vista, wrapping wget with some Ruby methods
(including using Dir[“path/*”] before and after running wget and
differencing to find the downloaded files), and that’s fairly easy to
do. (If you’d like to see the wrapping code, I’d be happy to post it
on Github.)

There is a thread here:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/333425
“I would need a ruby “wget version” which works on linux and windows.
I would like to feed it an URL to a .tar.bz2 or .zip or .tar.gz file
and have it download. That’s what it basically should do. Right now I
use
system 'wget '+the_url
which does not work on windows easily (but will work on pretty much
all the linuxes out there)”

  1. In the thread there were some suggestions that you could probably
    just use Net::HTTP or openuri. I used (and use) wget because (rightly
    or wrongly) I suspect it may be more robust with possibly flaky
    connections and/or large files (and I knew how to use wget, so there
    wasn’t a learning curve), but I will be experimenting with Net::HTTP,
    out of curiousity and to get downloads straight into Ruby (I hope)
    instead of via a file downloaded by wget.

  2. As I said, I found using wget from Ruby worked easily for me on
    Microsoft Windows, but I’d be interested in any experiences to the
    contrary.

  3. I seem to recall there was a wget.rb (something like that) which
    wrapped wget and which was installed with the Ruby Windows Installer,
    but I’ve just looked in my MS Windows Ruby and JRuby and couldn’t find
    it. But it should be easy to write your own wrapper for what you want
    to do.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs