Accessing PDF Metadata and Page Thumbnails

Ben_Gribaudo · July 26, 2007, 6:34pm

Hello,

I am putting together a PDF archive of our corporate newsletters. I’d
like to iterate though a directory of PDFs, read their metadata (title,
description, etc.) and use this info to dynamically generate a RHTML
index page. There are several Ruby PDF libraries out there but they seem
inclined towards creating PDFs instead of reading them. Any
recommendations on a library to read PDF metadata?

It would be neat to not only read metadata but also to pull the PDF’s
first page’s thumbnail out as an image. This would allow dynamic
creation of an index page that looks like this:
http://www.reviveourhearts.com/difference/newsletter/newsletter_archive.php

Any thoughts?

Thanks,
Ben

Ben_Gribaudo · July 27, 2007, 1:46pm

Excerpts from Ben Gribaudo’s message of Thu Jul 26 19:33:32 +0300 2007:

first page’s thumbnail out as an image. This would allow dynamic
creation of an index page that looks like this:
http://www.reviveourhearts.com/difference/newsletter/newsletter_archive.php

Any thoughts?
Have a look at http://extractor.rubyforge.org . You need libextractor
and its headers to compile it though. Would that work for you?

Thanks,
Ben

–
Eugen Minciu.

Wasting valuable time since 1985.