PDF text search in rails

is there any plugin which could search in PDF documents. For example,
user should be able to search for keywords in the PDF contents.

Good morning -

On 3-Jun-08, at 1:25 AM, ripan wrote:

is there any plugin which could search in PDF documents. For example,
user should be able to search for keywords in the PDF contents.

Someone submitted a patch for acts_as_solr to index documents - read
the google group for this project

J

is there any plugin which could search in PDF documents.

Maybe you can try this: http://raa.ruby-lang.org/project/rpdf2txt/
or JRoR and one of the many Java PDF libraries. I’m not aware of a
Rails plugin.

Someone submitted a patch for acts_as_solr to index documents - read
the google group for this project

I didn’t think solr would do this, since it provides index and query
but not parsing of rich formats. However, there seems to be a patch
that extracts text (but not metadata) from rich documents into solr:
UpdateRichDocuments - Solr - Apache Software Foundation. The solr committers
are reluctant to use that patch, though, and would rather build a
bridge from Tika (Apache Tika – Apache Tika) to solr, even if
that is further down the road.

I did find the patch to acts_as_solr here:
http://www.nabble.com/Rich-Document-support-for-solr-ruby-and-acts_as_solr-p17161561.html
But since this patch relies on the uncommitted solr patch, I wouldn’t
rely on this being viable for the long-term.

A less tenuous solution may be to extract the text from a PDF via some
other library (perhaps rpdf2txt or PDFbox), and indexing it using the
standard acts_as_solr.

  • Mark.