Sphinx vs ferret

vince · January 4, 2008, 5:28pm

I’ve got a smallish site with not a ton of data at the moment… but
all that could change at some point so I’d like to plan with that in
mind. Currently I’m deployed on an nginx/mongrel stack that works
quite well. My site uses Ferret for search and it’s ok… the big
problem is that some terms don’t show up as expected… especially if
there are apostrophes, plurals, etc involved.

I’ve got two choices that I see… pony up the O’reilly mini-pdf and
tweak ferret settings or scrap ferret and go with Sphinx (and hope it
handles cases like this better). I’m not sure how much time the
latter would take me but, assuming that I’m going to spend somewhere
around 40 hours anyway, which route would you all recommend?

Thanks for your time,
Vince

vince · January 4, 2008, 5:34pm

On Jan 4, 2008, at 8:26 AM, Vince W. wrote:

handles cases like this better). I’m not sure how much time the
latter would take me but, assuming that I’m going to spend somewhere
around 40 hours anyway, which route would you all recommend?

We’ve used ferret on past projects… and now use sphinx. We’re not
likely going back to ferret.

Robby

–
Robby R.
Founder and Executive Director

PLANET ARGON, LLC
Design, Development, and Hosting with Ruby on Rails

http://www.robbyonrails.com/

+1 503 445 2457
+1 877 55 ARGON [toll free]
+1 815 642 4068 [fax]

vince · January 4, 2008, 5:54pm

If you consider using Postgresql, then tsearch2 is awesome. Its built
into
the latest version of Postgresql.

Ericson S.
CTO
http://www.funadvice.com

vince · January 4, 2008, 6:23pm

latter would take me but, assuming that I’m going to spend somewhere
around 40 hours anyway, which route would you all recommend?

We’ve used ferret on past projects… and now use sphinx. We’re not
likely going back to ferret.

Can you elaborate on why? I’m mostly just curious

To the parent…

the ferret PDF booklet is pretty full of good information
if you stick with ferret. I don’t however remember if it discusses how
to
handle words with apostrophes in it. It does talk about how to hand
plurals via the StemFilter though.

http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StemFilter.html

-philip

vince · January 4, 2008, 9:37pm

On Jan 4, 2008, at 11:41 AM, Philip H. wrote:

it
To the parent…

the ferret PDF booklet is pretty full of good information
if you stick with ferret. I don’t however remember if it discusses
how to
handle words with apostrophes in it. It does talk about how to hand
plurals via the StemFilter though.

http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StemFilter.html

-philip

Ferret is unstable in production. Segfaults, corrupted indexes
galore. We’ve switched around 40 clients form ferret to sphinx and
solved their problems this way. I will never use ferret again after
all the problems I have seen it cause peoples production apps.

Plus sphinx can reindex many many times faster then ferret and uses
less cpu and memory as well.

Cheers-

Ezra Z.
– Founder & Software Architect
– [email protected]
– EngineYard.com

vince · January 4, 2008, 10:10pm

Ferret is unstable in production
Very true.

A decent search option is Lucene via acts_as_solr plugin.
I never used Sphynx though. Can anyone with firsthand experience of
both Lucene and Sphynx give their opinion?

–
Alexey V.
CruiseControl.rb [http://cruisecontrolrb.thoughtworks.com]
RubyWorks [http://rubyworks.thoughtworks.com]

vince · January 4, 2008, 10:36pm

Ferret has been very unstable for us. It is unfortunate because it seems
like it would be more customizable than Sphinx. But I must admit that I
like
that Sphinx can take the data by itself from MySQL and index it really
fast.
AEM

On Jan 4, 2008 1:37 PM, Ezra Z. [email protected] wrote:

there are apostrophes, plurals, etc involved.
likely going back to ferret.

–
Adrian Esteban Madrid
Lead Developer, Prefab Markets
http://www.prefabmarkets.com

vince · January 4, 2008, 10:40pm

On Jan 4, 2008, at 1:09 PM, Alexey V. wrote:

Ferret is unstable in production
Very true.

A decent search option is Lucene via acts_as_solr plugin.
I never used Sphynx though. Can anyone with firsthand experience of
both Lucene and Sphynx give their opinion?

–
Alexey V.

We have a bunch of clients using solr as well. In general it is more
powerful then sphinx but a lot slower to reindex and querey. Also it
uses 50 times the memory of sphinx. If you have a box or vm to put
SOLR on by itself then it is a good option as well. but if sphinx can
do everything you need from a a search indexer then it is a way better
option cost wise.

Cheers-

Ezra Z.
– Founder & Software Architect
– [email protected]
– EngineYard.com

vince · January 4, 2008, 10:30pm

handles cases like this better). I’m not sure how much time the

Ferret is unstable in production. Segfaults, corrupted indexes
galore. We’ve switched around 40 clients form ferret to sphinx and
solved their problems this way. I will never use ferret again after
all the problems I have seen it cause peoples production apps.

Huh. I must be lucky. Or not have that much to index (true) or users
don’t complain about not finding anything (probably very true)

I’ll have t ogive sphinx a go next time around… thanks ezra

vince · January 4, 2008, 10:41pm

On Fri, 2008-01-04 at 12:37 -0800, Ezra Z. wrote:

Ferret is unstable in production. Segfaults, corrupted indexes
galore. We’ve switched around 40 clients form ferret to sphinx and
solved their problems this way. I will never use ferret again after
all the problems I have seen it cause peoples production apps.

Just out of interest, were corrupted indexes seen even with only one
process writing to the index (via DRb as is recommended)? Multiple
writers are unsupported and cause these kinds of problems.

Segfaults were quite common in older version too, but it’s settled down
now and I’ve had it rather stable in a few small production sites
(though I’m not talking Twitter-like load :).

John.

http://www.brightbox.co.uk - UK Ruby on Rails hosting

vince · January 4, 2008, 10:55pm

On Jan 4, 2008, at 1:41 PM, John L. wrote:

writers are unsupported and cause these kinds of problems.

Segfaults were quite common in older version too, but it’s settled
down
now and I’ve had it rather stable in a few small production sites
(though I’m not talking Twitter-like load :).

John.

http://www.brightbox.co.uk - UK Ruby on Rails hosting

Yes we have tried every way possible of running ferret, by itself,
drb server etc. I really like ferrets interface and integration with
rails but unfortunately it causes nothing but problems for so many
people that I cannot recommend it with a straight face. Not meaning to
bash on the ferret devs here at all, just stating what I’ve seen
across hundreds of deployments.

Cheers-

Ezra Z.
– Founder & Software Architect
– [email protected]
– EngineYard.com

vince · January 4, 2008, 11:15pm

On Fri, 2008-01-04 at 11:26 -0500, Vince W. wrote:

latter would take me but, assuming that I’m going to spend somewhere
around 40 hours anyway, which route would you all recommend?

Hi Vince,

They’re different tools really. I’ve found the flexibility of Ferret to
be really quite awesome. I can (in Ruby):

set boost values independently per field and per record
write custom text tokenizers, stemmers and stop lists (and use
different ones per field even)
highlight matches in results using the same engine that does the
searching
manage my own indexes, merging them at will, or just merging results
from them.
Index content generated on the fly, without having to store it in my
sql database (pull in all the associated tags for a post as you index it
for example).
Store original data in the index (though most people use it to index
an SQL database anyway).
other awesome stuff I can’t remember right now.

Looking at the documentation for Sphinx (and it’s usual usage, with
MySQL), many (if not all) of those features are missing. But Sphinx is
reportedly quicker, supports distributed searching, and appears to be
undergoing more development that Ferret is at the moment so I think it
depends on your needs.

I’d recommend you ask on the Ferret mailing list about your search
result issues though - I’m surprised you’re having problems with that.
I’m sure it can be solved.

John.

http://www.brightbox.co.uk - UK Ruby on Rails hosting

vince · January 5, 2008, 3:48pm

A decent search option is Lucene via acts_as_solr plugin.
I never used Sphynx though. Can anyone with firsthand experience of
both Lucene and Sphynx give their opinion?

…

    We have a bunch of clients using solr as well. In general it is more
powerful then sphinx but a lot slower to reindex and querey. Also it
uses 50 times the memory of sphinx. If you have a box or vm to put
SOLR on by itself then it is a good option as well. but if sphinx can
do everything you need from a a search indexer then it is a way better
option cost wise.

I don’t have first hand experiences with sphinx, but i can confirm
that given a decent hw setup solr (with acts_as_solr) is really good
(not only in terms of performance but also of flexibility, and
functionality). We used it for miojob.it and it powers almost any
aspect of that site, which is built around faceted browsing of job
postings and has a only a few spots where caching was appropriate
without sweating under a traffic which is in the multi hundred K hits
per day (i don’t have the real numbers)

Anyhow given the lower system requirements, I’d like to give a try to
sphinx to see what can it do!

cheers,
Luca M.

http://spazidigitali.com - http://kiaraservice.com

vince · January 7, 2008, 4:07pm

I’ve been humming and hawing all weekend about whether or not to put
in the time to use Sphinx, and I guess the mountain of evidence is
clear: I’ll be moving my project over to Sphinx today.

James

vince · January 7, 2008, 9:20am

I’ve been using Ferret since it’s beginning, I’m also the french
translator
of the Ferret Shortcut’s for O’Reilly, and i can tell one thing: Don’t
use Ferret.
It’s really unstable and the development has stopped a while ago…
That’s
really sad because it was really an AWESOME product but it never
reached
a stable state.

I’ve experienced also huge problems with act_as_solr, so finally i’d
just
say “use Sphinx”. That’s for me the safier decision.

–
Jérémie ‘ahFeel’ BORDIER

vince · January 8, 2008, 10:00am

Ya we use ferret right now on our site. It’s ok, but it does segfault
about once a week. It’s not a huge deal I suppose, but doesn’t make me
feel good. Right now I’m evaluating switching to solr or sphinx. It
would be nice to have the ‘more like this’ ability that AAF/Ferret has.
I didn’t really see this feature with sphinx. We would also like to be
able to write a custom sort method, which I haven’t been able to do with
ferret. I see there’s an ability to do that with sphinx which looks
nice.

Anyways, can anyone recommend a sphinx plugin for Rails?
There’s 3 so far that I found. acts_as_sphinx, ultrasphinx, and
sphinctor. Are they all actively updated?

Thanks,
Ray

vince · January 12, 2008, 8:52pm

Ultrasphinx is awesome… I use it for many sites.
and as well I have some capistrano+ultrasphinx recipes.
http://frederico-araujo.com/2007/12/7/capistrano-2-1-and-ultrasphinx

Sphinx is not as complete, but almost, as SOLR…

but all I can say is that sphinx itself is a piece of art software.

It indexes REALLY fast, 5 seconds, 25,000 records database,
I have a cron job that each hour it updates the index.

/ultrasphinx/production.conf’…
indexing index ‘complete’…
collected 25088 docs, 10.4 MB
sorted 1.7 Mhits, 100.0% done
total 25088 docs, 10409184 bytes
total 5.361 sec, 1941487.86 bytes/sec, 4679.33 docs/sec

FERRET is in my second choice only because shared hosts won’t support
sphinx…

what a sad thing

On Jan 8, 4:00 pm, Raymond O’Connor <ruby-forum-incom…@andreas-

vince · January 19, 2008, 12:03am

I’m not sure about acts_as_sphinx and sphinctor being actively
updated, but I can confirm that both Ultrasphinx and Thinking Sphinx
(my own plugin - http://ts.freelancing-gods.com) are regularly updated

and under the hood they both use the same Ruby Sphinx client -
Riddle (http://riddle.freelancing-gods.com - again, mine - sorry for
blowing my own trumpet), which I’ve been keeping up to date to match
the recent releases of Sphinx.

Evan’s and my plugins do a lot of the same things, just different
approaches, so, with as little bias as possible, I think either can do
the job for you. I can’t speak for the other two plugins though, as
it’s been so long since I’ve looked into them.

Cheers

–
Pat
e: [email protected] || m: 0413 273 337
w: http://freelancing-gods.com || p: 03 9386 0928
discworld: http://ausdwcon.org || skype: patallan

On Jan 8, 8:00 pm, Raymond O’Connor <ruby-forum-incom…@andreas-

vince · January 19, 2008, 12:40am

On Jan 18, 2008 4:17 PM, Jeff [email protected] wrote:

…
How difficult would it be to change over to Sphinx?

That would really depend on how you hooking up with Ferret and if you
were
using any advanced features. My guess is that it shouldn’t be too hard
to
switch.

–
Adrian Esteban Madrid
Lead Developer, Prefab Markets
http://www.prefabmarkets.com

vince · January 19, 2008, 12:18am

I’ve been playing with Ferret for awhile. I actually get corrupted
indexes just running in development. I’m close to deploying an app
that uses ferret and some of the things I’ve heard really worry me.
Haven’t had a chance to test the drb server though, but the whole idea
of that bothers me too.

How difficult would it be to change over to Sphinx?