Forum: Rails Engines Indexed Search Engine vs MySql fulltext search

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
8520bd1ba7ef07ee81484fbda8ef4cbb?d=identicon&s=25 Henrik =?iso-8859-1?Q?Orm=E5sen?= (Guest)
on 2006-03-13 19:51
(Received via mailing list)
Sorry for my unknowings, but what's the pros and cons for Search
Engine vs. MySql fulltext search (mfs)?

I'm using mfs now, and wonder if going over to the Search Engine is an
good idea :-).

- Henrik
D046cca1a33655b6285065ec89711389?d=identicon&s=25 Lance Ball (Guest)
on 2006-03-13 21:26
(Received via mailing list)
Hi Henrik

MySql fulltext search is fine if you have a relatively small data set
and if
the data in each row is relatively small.  Indexed search is really
beneficial where you have a lot of data.  Doing full table scans (like
you'd
get with fulltext searches) on large data sets is going to slow down
your
searches quite a bit.

In addition, fulltext search requires you to be smart about your search
phrase.  For example, if I have a row in my database with this text in
it:

"Descendents of class
Exception<http://www.ruby-doc.org/core/classes/Exception.html>are used
to communicate between
raise methods and rescue statements in begin/end blocks.
Exception<http://www.ruby-doc.org/core/classes/Exception.htm...
carry information about the exceptionâ??its type (the exception's
class name), an optional descriptive string, and optional traceback
information. Programs may subclass
Exception<http://www.ruby-doc.org/core/classes/Exception.html>to add
additional information."

A fulltext search for "exception traceback" will not return that record,
whereas an indexed search for that phrase will.  And a search for
"classes"
(as opposed to "class") will also not return that record, but an indexed
search will.

Indexed search does not use table scans.  Instead, each word in your
record
is stored individually and indexed.

When you create/update a record, you submit the content to be indexed to
the
engine.  The engine tokenizes the content, removes stop words such as
"and",
"or", "of", etc, and then stems each word - so that something like
"running"
becomes "run" in the database.  Each stemmed word, or term, is stored in
a
separate table with a pointer to the original record.  This allows the
"term" column to be indexed, and so queries on this column are very fast
-
table scans are not involved.

So, overall, I'd say the benefits of Indexed Search Engine are:

* Speed on large datasets.  There are some performance tests in the
codebase
if you are interested in doing some benchmarks.  I haven't published
these
results because it's all pretty relative.  I'm running the tests on my 2
year old G4 powerbook with less than 1GB RAM.  On a different system
your
results may very well be different.

* Flexiblity on searches.  You don't have to get the phrase exactly
right to
get a search hit.   Search phrases are tokenized and stemmed in the same
way
that the content is, and searches occur at the term level.

* Stored content and search phrases are stemmed so that a search for
"growing" will return records containing "grows", "growing", "grow",
etc.

* A downside to indexed_search is that you may end up with a very long
terms
table.  And the index for this table is likely to be pretty big.  You're
basically trading storage for performance.

I hope that helps you understand the differences.  If you need more
info,
drop me a note.

Lance
D449d54c3b0f8c9930c11c7d7d3e6cdd?d=identicon&s=25 Surendra Singhi (Guest)
on 2006-03-15 11:55
(Received via mailing list)
"Lance Ball" <lanceball@gmail.com> writes:

> I hope that helps you understand the differences.  If you need more info,
> drop me a note.

Hello Lance,
  Thanks for the detailed post containing the differences between mysql
search
and the Indexed Search Engine.
It would be great if you also outline the differences between using
ferret and
indexed_search_engine.

Thanks.
--
Surendra Singhi
http://ssinghi.kreeti.com, http://www.kreeti.com
Read my blog at: http://cuttingtheredtape.blogspot.com/
,----
| "War is Peace! Freedom is Slavery! Ignorance is Strength!"
|     -- Orwell, 1984, 1948
`----
This topic is locked and can not be replied to.