Indexed Search Engine vs MySql fulltext search


#1

Sorry for my unknowings, but what’s the pros and cons for Search
Engine vs. MySql fulltext search (mfs)?

I’m using mfs now, and wonder if going over to the Search Engine is an
good idea :-).

  • Henrik

#2

Hi Henrik

MySql fulltext search is fine if you have a relatively small data set
and if
the data in each row is relatively small. Indexed search is really
beneficial where you have a lot of data. Doing full table scans (like
you’d
get with fulltext searches) on large data sets is going to slow down
your
searches quite a bit.

In addition, fulltext search requires you to be smart about your search
phrase. For example, if I have a row in my database with this text in
it:

“Descendents of class
Exceptionhttp://www.ruby-doc.org/core/classes/Exception.htmlare used
to communicate between
raise methods and rescue statements in begin/end blocks.
Exceptionhttp://www.ruby-doc.org/core/classes/Exception.htmlobjects
carry information about the exceptionâ??its type (the exception’s
class name), an optional descriptive string, and optional traceback
information. Programs may subclass
Exceptionhttp://www.ruby-doc.org/core/classes/Exception.htmlto add
additional information.”

A fulltext search for “exception traceback” will not return that record,
whereas an indexed search for that phrase will. And a search for
“classes”
(as opposed to “class”) will also not return that record, but an indexed
search will.

Indexed search does not use table scans. Instead, each word in your
record
is stored individually and indexed.

When you create/update a record, you submit the content to be indexed to
the
engine. The engine tokenizes the content, removes stop words such as
“and”,
“or”, “of”, etc, and then stems each word - so that something like
“running”
becomes “run” in the database. Each stemmed word, or term, is stored in
a
separate table with a pointer to the original record. This allows the
“term” column to be indexed, and so queries on this column are very fast

table scans are not involved.

So, overall, I’d say the benefits of Indexed Search Engine are:

  • Speed on large datasets. There are some performance tests in the
    codebase
    if you are interested in doing some benchmarks. I haven’t published
    these
    results because it’s all pretty relative. I’m running the tests on my 2
    year old G4 powerbook with less than 1GB RAM. On a different system
    your
    results may very well be different.

  • Flexiblity on searches. You don’t have to get the phrase exactly
    right to
    get a search hit. Search phrases are tokenized and stemmed in the same
    way
    that the content is, and searches occur at the term level.

  • Stored content and search phrases are stemmed so that a search for
    “growing” will return records containing “grows”, “growing”, “grow”,
    etc.

  • A downside to indexed_search is that you may end up with a very long
    terms
    table. And the index for this table is likely to be pretty big. You’re
    basically trading storage for performance.

I hope that helps you understand the differences. If you need more
info,
drop me a note.

Lance


#3

“Lance B.” removed_email_address@domain.invalid writes:

I hope that helps you understand the differences. If you need more info,
drop me a note.

Hello Lance,
Thanks for the detailed post containing the differences between mysql
search
and the Indexed Search Engine.
It would be great if you also outline the differences between using
ferret and
indexed_search_engine.

Thanks.

Surendra S.
http://ssinghi.kreeti.com, http://www.kreeti.com
Read my blog at: http://cuttingtheredtape.blogspot.com/
,----
| “War is Peace! Freedom is Slavery! Ignorance is Strength!”
| – Orwell, 1984, 1948
`----