Downloading Array of PDF Files Extracting MetaData

dandiebolt · April 13, 2006, 3:37pm

Let me simplify the description of a task I have to perform.

I have an array of urls that all point to pdf files. I need to iterate
through the array and download each pdf file and extract some simple
metadata such as title, author, date, etc that is know to be in each pdf
file in either a known place or locatable with a known pattern (regular
expression). The output of this would be a directory full of the pdf
files and an html index file that lists the metadata along with a
hyperlink to the pdf file.

Do any ruby modules exist to help with this task of downloading pdf
files and extracting text from them? Can the open-uri module do this?

Thanks in advance