PDF to HTML converter for Ruby?

Does anyone know of a good package that can convert a PDF into HTML?
Cross-platform compatible is a plus, but I can live with Linux-only if
it comes to that.

I’ve never been able to find a reliable, open source solution to this
problem, if anyone knows of one I’d really like to know about it as

Here are some options that I know of:

If you just have a few PDFs, you can save them as HTML from Acrobat (not
Reader), or with Adobe’s online conversion tool at:


So if a commercial, non-Ruby solution is OK for you, Adobe obviously can
what you want and the appropriate capabilities to convert many documents
probably available in their server products. Or you might be able to get
what you want through the Acrobat SDK.

There is a commercial product called PDFLib (http://www.pdflib.org). It
works with almost every major programming language, including Ruby, and
a ton of features. No direct conversion to HTML, but you can extract
with PDFLib TET and then mark it up with Ruby.

The only totally open option I know of is PDFBox
Its a Java library of PDF functions, including the ability to extract
similar to PDFLib TET, but again you’re on you’re own to mark it up as


This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs