Saving a PDF locally


#1

Hi Ruby-folks,

I am currently working on a small program, which saves copies of website
locally to my harddisk. For normal html-pages this works as expected.
But now I am struggeling with binary files such as PDFs.

Here is my code:

open("http://www.somewebpage.com/atestfile.pdf){|u|
targetFile = File.new("test.pdf,“w”)
u.each_byte {|ch|
targetFile.putc ch
}
}

The resulting local file cannot be opened with my pdf-reader. When I
open it in an editor, there seems only to be numbers in the file (->not
binary).

instead of putc if tried write which did not work, too.

Any hints?

Yochen


#2

Yochen G. wrote:

Hi Ruby-folks,

I am currently working on a small program, which saves copies of website
locally to my harddisk. For normal html-pages this works as expected.
But now I am struggeling with binary files such as PDFs.

targetFile = File.new("test.pdf,“w”)

Use “wb” instead of “w”. On windows, this treats the data as binary
instead of lines of text that should be terminated with cr-lf.


#3

Joel VanderWerf wrote:

Use “wb” instead of “w”. On windows, this treats the data as binary
instead of lines of text that should be terminated with cr-lf.
well, although I am working on OSX (forgot to mention), I tried your
hint but that did not work(like expected). Hm… Any other idea?

-Yochen


#4

On May 30, 2006, at 12:29 AM, Yochen G. wrote:

open("http://www.somewebpage.com/atestfile.pdf){|u|
targetFile = File.new("test.pdf,“w”)
u.each_byte {|ch|
targetFile.putc ch
}
}

Try:

open(“http://www.somewebpage.com/atestfile.pdf”, ‘rb’){|u|
File.open(“test.pdf”, “wb”) do |f|
f.write(u.read)
end
}

– Daniel


#5

Yochen G. wrote:

Joel VanderWerf wrote:

Use “wb” instead of “w”. On windows, this treats the data as binary
instead of lines of text that should be terminated with cr-lf.
well, although I am working on OSX (forgot to mention), I tried your
hint but that did not work(like expected). Hm… Any other idea?

-Yochen

Sorry! I jumped to conclusions about the problem.

The following works for me on linux. Can’t make any predictions about
OSX, tho’.

require ‘open-uri’

open(“http://path.berkeley.edu/~vjoel/redshift/ruby-sdforum.pdf”){|u|
targetFile = File.new(“test.pdf”,“w”)
u.each_byte {|ch|
targetFile.putc ch
}
}

Why are you doing it a byte at a time? This seems to run much faster for
me:

require ‘open-uri’

open(“http://path.berkeley.edu/~vjoel/redshift/ruby-sdforum.pdf”) do |u|
targetFile = File.new(“test.pdf”,“w”)
loop do
dat = u.read(1000)
break unless dat
targetFile.write dat
end
end


#6

Daniel H. wrote:

Try:

open(“http://www.somewebpage.com/atestfile.pdf”, ‘rb’){|u|
File.open(“test.pdf”, “wb”) do |f|
f.write(u.read)
end
}
Thanx, Daniel, your solution is working as well! And it is even shorter!

Slowly I am wondering why I did’t come up with a working solution by
myself ;-]

– Yochen


#7

Joel VanderWerf wrote:

Why are you doing it a byte at a time?
just ran out of ideas :wink:

This seems to run much faster for me:

require ‘open-uri’

open(“http://path.berkeley.edu/~vjoel/redshift/ruby-sdforum.pdf”) do |u|
targetFile = File.new(“test.pdf”,“w”)
loop do
dat = u.read(1000)
break unless dat
targetFile.write dat
end
end

Fantastic!

Thanx a lot.

-Yochen