PDF to HTML converter for Ruby?

Does anyone know of a good package that can convert a PDF into HTML?
Cross-platform compatible is a plus, but I can live with Linux-only if
it comes to that.

I’ve never been able to find a reliable, open source solution to this
problem, if anyone knows of one I’d really like to know about it as
well.

Here are some options that I know of:

If you just have a few PDFs, you can save them as HTML from Acrobat (not
Reader), or with Adobe’s online conversion tool at:

http://www.adobe.com/products/acrobat/access_onlinetools.html

So if a commercial, non-Ruby solution is OK for you, Adobe obviously can
do
what you want and the appropriate capabilities to convert many documents
are
probably available in their server products. Or you might be able to get
at
what you want through the Acrobat SDK.

There is a commercial product called PDFLib (http://www.pdflib.org). It
works with almost every major programming language, including Ruby, and
has
a ton of features. No direct conversion to HTML, but you can extract
text
with PDFLib TET and then mark it up with Ruby.

The only totally open option I know of is PDFBox
(http://www.pdfbox.org).
Its a Java library of PDF functions, including the ability to extract
text
similar to PDFLib TET, but again you’re on you’re own to mark it up as
HTML.

HTH,
Jeff

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs