I’m looking for libraries to do text extraction from MS Office and PDF
file formats. Also looking for libraries to do HTML rendering of
documents in the same formats. I know of couple of commercial
libraries from Oracle and Autonomy, but they only have C and/or Java
APIs. I also found this project http://poi.apache.org/poi-ruby.html.
Is there other open source alternatives, and/or alternatives with Ruby