What is the best way to download files from the internet (HTTP) that
are greater than 1GB?
Here’s the story in whole…
I was trying to use Ruby Net::HTTP to manage a download from
wikipedia… Specifically all current versions of the english one…
But anyways, as I was downloading it, I got a memory error as I ran
out of RAM.
My current code:
open(@opts[:out], “w”) do |f|
http = Net::HTTP.new(@url.host, @url.port)
c = http.start do |http|
a = Net::HTTP::Get.new(@url.page)
http.request(a)
end
f.write(c.body)
end
I was hoping there’d be some method that I can attach a block to, so
that for each byte it will call the block.
Is there some way to write the bytes to the file as they come in, not
at the end?
Thanks,
---------------------------------------------------------------|
~Ari
“I don’t suffer from insanity. I enjoy every minute of it” --1337est
man alive
Is there some way to write the bytes to the file as they come in, not at
the end?
Not precisely what you asked for, but this is how ara t. howard told me
to download large files, using open-uri. This gets one 8kb sized chunk
at a time:
open(uri) do |fin|
open(File.basename(uri), "w") do |fout|
while (buf = fin.read(8192))
fout.write buf
end
end
end
But doesn’t open-uri download the whole thing to your compy? I was about
to use it, but then I ran it in irb and saw it returned a file object.
Isn’t that what you want to happen? I thought your question was about
how to download it in small chunks so it’s not all in memory at the same
time. This code downloads the whole file, but 8kb at a time.
But doesn’t open-uri download the whole thing to your compy? I was
about to use it, but then I ran it in irb and saw it returned a
file object.
Isn’t that what you want to happen? I thought your question was
about how to download it in small chunks so it’s not all in memory
at the same time. This code downloads the whole file, but 8kb at a
time.
No, I thought when you use Kernel#open with open-uri, it FIRST
downloads the entire 1GB file to your temp folder, and THEN runs your
block on that file in temp
Is there some reason to not use wget or curl? Those are both
written already. What are you hoping to do with the files you
download?
I’m trying to write wget/axel in ruby. Plus add torrent support!
Is there some particular reason not to use Aria2, it’s already written
Yes, the UI sucks, and it cannot download multifile torrents from the
web as well but to compete with that you would have to make something
really good
Is there some particular reason not to use Aria2, it’s already
written
Yes, the UI sucks, and it cannot download multifile torrents from the
web as well but to compete with that you would have to make something
really good
Well then I have a competitor!
I’m really writing this just for practice, but also because I think
the world needs a ruby downloader.
Maybe to give myself a fighting chance against aria2, I’ll lower the
version numbers instead of raising them.
same time. This code downloads the whole file, but 8kb at a time.
No, I thought when you use Kernel#open with open-uri, it FIRST downloads
the entire 1GB file to your temp folder, and THEN runs your block on
that file in temp
Interesting. I just tried downloading a 6.1MB file with open-uri and
didn’t see that behavior. I’m using Ruby 1.8.6 on OS X 10.5.
8kb at a time.
No, I thought when you use Kernel#open with open-uri, it FIRST
downloads the entire 1GB file to your temp folder, and THEN runs
your block on that file in temp
Interesting. I just tried downloading a 6.1MB file with open-uri
and didn’t see that behavior. I’m using Ruby 1.8.6 on OS X 10.5.
That’s good then! I’ll test it out myself juuuust to make sure. I
don’t to waste 4GB of space when i only need 2GB.
open-uri uses Net::HTTP, of course. Am I correct?
Net::HTTP wraps connections in a Timeout, which is REALLY screwing
with me downloading large files.
Will probably get some monkeys to patch that for me.
I think you should definitely use BitTorrent rather than HTTP. I spoke
to the maintainer/developer a while ago and I think ruby-torrent isn’t
being actively worked on, but it could definitely save you some
headaches if you start there.
(I have homework to do)
Are you insane? Firstly it already has a RubyForge page with download
files, secondly I mentioned having spoken to the maintainer - which
would mean the maintainer was not me - and thirdly who would say yes
to that?
(And fourth, kind of a tangent, but who expects an O’Reilly book on
Ruby to have accurate information?)
You mean AFTER you have sniped at the newbies, right? The kettle, the
pot, et cetera.
What are you talking about? I don’t get it. Yes, after I snipe at
newbies, I say wouldn’t it be great if we could just let Zed handle
it. Because he’s better at it. Where does a pot and a kettle enter the
equation?