is there any plugin which could search in PDF documents. For example,
user should be able to search for keywords in the PDF contents.
Good morning -
On 3-Jun-08, at 1:25 AM, ripan wrote:
is there any plugin which could search in PDF documents. For example,
user should be able to search for keywords in the PDF contents.
Someone submitted a patch for acts_as_solr to index documents - read
the google group for this project
J
is there any plugin which could search in PDF documents.
Maybe you can try this: http://raa.ruby-lang.org/project/rpdf2txt/
or JRoR and one of the many Java PDF libraries. I’m not aware of a
Rails plugin.
Someone submitted a patch for acts_as_solr to index documents - read
the google group for this project
I didn’t think solr would do this, since it provides index and query
but not parsing of rich formats. However, there seems to be a patch
that extracts text (but not metadata) from rich documents into solr:
UpdateRichDocuments - Solr - Apache Software Foundation. The solr committers
are reluctant to use that patch, though, and would rather build a
bridge from Tika (Apache Tika – Apache Tika) to solr, even if
that is further down the road.
I did find the patch to acts_as_solr here:
http://www.nabble.com/Rich-Document-support-for-solr-ruby-and-acts_as_solr-p17161561.html
But since this patch relies on the uncommitted solr patch, I wouldn’t
rely on this being viable for the long-term.
A less tenuous solution may be to extract the text from a PDF via some
other library (perhaps rpdf2txt or PDFbox), and indexing it using the
standard acts_as_solr.
- Mark.