I am using ferret right now, and it works great for all my regular text
documents/information. My problem arises when I want to index/search all
our assets (mostly pdf files). Currently, there is no way to READ pdfs
Ruby. Because of this I have to resort to using Java to read the PDF’s
then Lucene to index them. My problem here is a couple things.
One, to index a asset I have to either fire up a complete new JVM for
asset, or have to the index rebuilt each night at a set time. Each way
their own advantages/downfalls, but the biggest is that Ferret doesn’t
to talk to Lucene created indexes doh!
So, on to number two. So now I can go at this from a couple angles. I
create a Java webservice to do the indexing and the searching and then
return the results. Or I could simply write a small utility program
groovy perhaps?) that uses Java just to get the content of the pdf files
use ferret for everything. Or some combination of one or the other or
something completly different.
I’m interested in what you folks out there have to say about this. I
really really like to avoid creating a whole web service just for
but if thats the most viable way then I may go that route.
-Nick “searching for a clue” S