I am using ferret right now, and it works great for all my regular text
documents/information. My problem arises when I want to index/search all
of
our assets (mostly pdf files). Currently, there is no way to READ pdfs
from
Ruby. Because of this I have to resort to using Java to read the PDF’s
and
then Lucene to index them. My problem here is a couple things.
One, to index a asset I have to either fire up a complete new JVM for
each
asset, or have to the index rebuilt each night at a set time. Each way
has
their own advantages/downfalls, but the biggest is that Ferret doesn’t
like
to talk to Lucene created indexes doh!
So, on to number two. So now I can go at this from a couple angles. I
could
create a Java webservice to do the indexing and the searching and then
return the results. Or I could simply write a small utility program
(with
groovy perhaps?) that uses Java just to get the content of the pdf files
and
use ferret for everything. Or some combination of one or the other or
something completly different.
I’m interested in what you folks out there have to say about this. I
would
really really like to avoid creating a whole web service just for
searching,
but if thats the most viable way then I may go that route.
-Nick “searching for a clue” S