Finding the size of a file on web server?

I’m writing a little script to help automate downloading videos off of
YouTube, and I would like to know how to figure out the size of the
video files I will be downloading. I’ve experimented briefly with
Net::HTTP, but it’s a little beyond me, as I haven’t been able to
figure it out.

The URLs I’m using look like such:

http://www.youtube.com/get_video?video_id=put_real_video_id_here

and I’m getting the video id by a simple regexp. The trouble isn’t
finding the file size, per-se, but finding it before I download the
entire thing.

I don’t know if this is possible or not, but any help on the subject
would be greatly appreciated!

Thank You

“CBlair1986” [email protected] writes:

I’m writing a little script to help automate downloading videos off of
YouTube, and I would like to know how to figure out the size of the
video files I will be downloading. I’ve experimented briefly with
Net::HTTP, but it’s a little beyond me, as I haven’t been able to
figure it out.

Well, this depends on what you want to do - do you want to ask for the
size of each file separately, or is it acceptable if you ask to get
the download, and can get information about the size of the file the
instant that the download starts?

In the first case, it depends on what information youtube is willing
to give out in response to a HEAD request - not every webserver will
give you a content-length header in HEAD, or even necessarily allow
HEAD requests.

To test the first case, try this in irb:
require ‘net/http’
myvidlen = Net::HTTP.start(‘www.youtube.com’, 80) {|http|
http.head(‘/get_video?video_id=real_video_id’).content_length
}

For the second case, you’re going to need to do something like this:
require ‘net/http’
Net::HTTP.start(‘www.youtube.com’, 80) {|http|
http.request_get(‘/get_video?video_id=real_video_id’) {|resp|
myvidlen = resp.content_length
puts “Video real_video_id has length #{myvidlen}”
# use the length to do something, like choose a directory
File.open(‘whereISaveMyVideo.whatever’, ‘wb’) {|f|
response.read_body { |data|
f.print(data)
}
}
}
}

Well, the first method doesn’t work, and the second method isn’t
something that I’m looking for. I’d like to confirm if I’d actually
like to download the file. Thanks anyways.