Help on searching large sets of data

I am reaching out to get some help/ideas about our current search
implementation.

We are currently using Ferret as our search engine and its not keeping
up with our needs… we are quite needy ;). We have about 5m+ documents
that total around 8-10gb of data and indexing can take up to a half a
day which is obviously not where we want to be.

I’ve been looking into Solr and Sphinx and I’m am wondering if anyone
would have any thoughts to which would be suite our needs. What are the
pros/cons of each?

Another search related question I have is how would one go about
classifying documents based on content? Does Solr or Sphinx have this
capability? Even better question is how do I go about displaying
categories (and their counts) for a specific search term? For example if
I search for ‘hat’ I want a category browse like the following:

Accessories (30,940)

Children (24,090)

Crochet (19,884)

How is this done? FYI for a live example of what I want check out:

Any help would be greatly appreciated. Thanks

On 12/11/2009 06:50 PM, [email protected] Mr balla wrote:

pros/cons of each?
We would have to know what your needs are for that. So far I have
understood that you have a particular volume of files you want to text
index but and search. I have no idea what kinds of documents you have
and what searches you want to be able to do (only words, combinations
etc.). Even with your life example I’m not really seeing things clearer
(might be due to my lack of knowledge about Ferret and their website
being unresponsive to me).

Another search related question I have is how would one go about
classifying documents based on content? Does Solr or Sphinx have this
capability?

Solr doesn’t - judging from their website.

Any help would be greatly appreciated. Thanks

If you actually switch, this might be of help
http://blog.xing.com/2009/07/migrating-our-search-from-ferret-to-solr/

Kind regards

robert

Thanks for the reply. As for your question on what type of documents we
are indexing the answer is its very similar to that site I mentioned or
an Ebay like site.

So there are a bunch of items that people are selling/buying and I would
like a user to be able to search those items against title, description,
keywords etc.

Robert K. wrote:

On 12/11/2009 06:50 PM, [email protected] Mr balla wrote:

pros/cons of each?
We would have to know what your needs are for that. So far I have
understood that you have a particular volume of files you want to text
index but and search. I have no idea what kinds of documents you have
and what searches you want to be able to do (only words, combinations
etc.). Even with your life example I’m not really seeing things clearer
(might be due to my lack of knowledge about Ferret and their website
being unresponsive to me).

Another search related question I have is how would one go about
classifying documents based on content? Does Solr or Sphinx have this
capability?

Solr doesn’t - judging from their website.

Any help would be greatly appreciated. Thanks

If you actually switch, this might be of help
http://blog.xing.com/2009/07/migrating-our-search-from-ferret-to-solr/

Kind regards

robert

On Fri, Dec 11, 2009 at 3:09 PM, Mr Balla [email protected] wrote:

Thanks for the reply. As for your question on what type of documents we
are indexing the answer is its very similar to that site I mentioned or
an Ebay like site.

So there are a bunch of items that people are selling/buying and I would
like a user to be able to search those items against title, description,
keywords etc.

Then you shouldn’t be searching ‘documents’ at all, you should have
those attributes as separate fields in the DB with proper indexes.