FULLTEXT search in MySQL on rails

I tried adding a FULLTEXT search index to a table of mine in MySQL,
only to discover that the InnoDB table format doesn’t seem to support
this feature. Switching to the MyISAM table type seemed to work, but
I seem to have some recollection that ActiveRecord transactions aren’t
fully atomic on MyISAM tables. Is this true or am I just remembering
wrong? If so, does anyone have any suggestions for doing fulltext
searches in rails? I found a couple of links on the wiki but they
only use LIKE syntax, which doesn’t rank searches by word frequency
like FULLTEXT searches do.

Thanks,
Carl

Yeah, you have to use a separate MyISAM table for searches. But you
can run both MyISAM and InnoDB tables in the same database, so it
isn’t really a hindrance. Just build a specific table for searches.

Hope that helps!

caleb

On 11/18/05, Carl Y. [email protected] wrote:

Thanks,
Carl


Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails


caleb
http://www.ruejulesverne.com

I’m not sure what you mean. Do you mean store duplicate instances of
my data, one for updating and keeping track of with ActiveRecord, and
one for searching?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 18, 2005, at 6:24 PM, Carl Y. wrote:

I tried adding a FULLTEXT search index to a table of mine in MySQL,
only to discover that the InnoDB table format doesn’t seem to support
this feature. Switching to the MyISAM table type seemed to work, but
I seem to have some recollection that ActiveRecord transactions aren’t
fully atomic on MyISAM tables. Is this true or am I just remembering
wrong? If so, does anyone have any suggestions for doing fulltext
searches in rails? I found a couple of links on the wiki but they
only use LIKE syntax, which doesn’t rank searches by word frequency
like FULLTEXT searches do.

I don’t know how involved you want to get, but a common way is to use
InnoDB in the master database then replicate to a slave which uses
MyISAM.
Then perform your fulltext queries against the slave. This eliminates
search load from your main database as well.

Other possibilities:

    • mirror your data in the InnoDB table foobars to an auxiliary MyISAM
      table
      foobars_search with callbacks on the Foo model to keep the two in
      sync.
    • dump MySQL fulltext in favor of Odeum or Ferret.

Best,
jeremy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)

iD8DBQFDfpRoAQHALep9HFYRAhYoAKCTFjtY/olDCEqeHoLQZaa3Br+lbQCfQ1s4
yW/XK1xBg8B9yW/wbux3vpk=
=L3G6
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 18, 2005, at 7:14 PM, Carl Y. wrote:

Wow. I’m surprised to find that it is so difficult. Thanks for the
info. This makes me wonder… Back in the early days of mysql they
recommended using LOCK/UNLOCK TABLES to achieve transaction-like
functionality. Monty’s argument was that myisam is about 3 times
faster than fully ACID-compliant tables and the act of locking tables
is really low overhead. Also, you only need to lock the tables that
are needed for the transaction.

It’s fast for one writer but very slow for many. InnoDB often
outperforms MyISAM due to writer concurrency alone (it has row locks
whereas MyISAM has table locks only.)

How hard would it be to add support for MyISAM tables in ActiveRecord
by locking the tables needed for a query instead of using InnoDB
transactions? I think this would be a useful alternative to InnoDB,
not just because it would allow people to use FULLTEXT but also
because it might actually be faster in some cases.

Table locking cannot emulate transactions. Otherwise, MyISAM would
have them. It gives you a small portion of ACID (isolation only) and
no rollback capability.

Thoughts?

I think the best option is for MySQL to implement fulltext indexes on
InnoDB tables.

jeremy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)

iD8DBQFDfqIVAQHALep9HFYRAkFEAJ4h/N7R+0Z7HWTZ2g7cNY/UDccE/QCfalOx
ndefj9U8xhapUhSko0A3f5I=
=RbOj
-----END PGP SIGNATURE-----

Wow. I’m surprised to find that it is so difficult. Thanks for the
info. This makes me wonder… Back in the early days of mysql they
recommended using LOCK/UNLOCK TABLES to achieve transaction-like
functionality. Monty’s argument was that myisam is about 3 times
faster than fully ACID-compliant tables and the act of locking tables
is really low overhead. Also, you only need to lock the tables that
are needed for the transaction.

How hard would it be to add support for MyISAM tables in ActiveRecord
by locking the tables needed for a query instead of using InnoDB
transactions? I think this would be a useful alternative to InnoDB,
not just because it would allow people to use FULLTEXT but also
because it might actually be faster in some cases.

Thoughts?

Carl

Hi, Carl,

is there a specific reason why you don’t think about using a ‘fulltext
search specialist’ for the job? MySQL-Fulltext is often (have a look at
O'Reilly Media - Technology and Business Training for example) considered inferior
to ‘real’ search engines performance- and feature-wise. And the great
part: You already have the choice: Try ferret, try ruby-odeum, try
lucene with bindings or as a web-service. I’m sure there will be the
perfect fit for you!

regards
Jan P.

Hi Carl,

Please do check out Ferret. Jan has written a great howto on
integrating Ferret with Rails here;

http://wiki.rubyonrails.com/rails/pages/HowToIntegrateFerretWithRails

Also, you might like to check out the quickstart tutorial here;

http://ferret.davebalmain.com/api/files/TUTORIAL.html

I’ve tried to make the API as simple as possible so I don’t think it’d
be too much more work than using MySQL full text search and the
results you’ll be getting will be a lot better. And if it’s speed
you’re worried about, I expect to have the full C backed version out
by the end of the month which should be faster than Java Lucene. I
don’t know how it will compare with MySQL full text search but I’m
pretty sure it wouldn’t be the bottleneck in your app.

Cheers,
Dave

Thanks Jan and Dave for the encouragement. The mere thought of
abandoning an RDBMS for the barbaric wastes of the filesystem filled
me with dread. But now that I read that tutorial it really isn’t all
that hard. And I think it will probably improve the performance a
good deal. I’ll give it a try. I especially like how the tutorial
explains how to keep your index in sync with your database.

Carl

On 11/19/05, Jan P. [email protected] wrote:

regards
Jan P.

GREAT! Please notice, that Dave made big process since I’ve posted the
howto. Notice further the thread on the mailing list about “best
practices” of the integration of the indexer. You might mix it into your
models. And finally: Just ask if you have problems. I’m sure you won’t
regret your decision of stepping into the world of Ferret/Lucene. In my
java-projects Lucene always happens to be one of the most important
parts, mostly because of it’s performance and its capability of fuzzy
searches!

regards
Jan P.

On 11/20/05, Carl Y. [email protected] wrote:

Hey Dave, have you considered using RubyInline to speed up Ferret? It
might be a lot easier than writing a C module.

Hey Carl,
I’ve looked into this and I can’t get the same speed up as I can get
by writing the full indexer in pure C. In fact, I even started writing
each Ruby class in C (which should be quicker than ruby inline) but
even this was nowhere near as quick as going the pure C route. The
current version has the 10 most heavily used classes rewritten in C
already, but I’m still expecting a 50 times speed up when I integrate
the full C version.

Dave

Hey Dave, have you considered using RubyInline to speed up Ferret? It
might be a lot easier than writing a C module.

On 19 Nov 2005, at 23:15, David B. wrote:

speed up there until I’ve actually finished the work, but I’m fairly
confident that it will be faster.

Fair enough… but I’ve already seen folks taking your speed-up
claims to be against Java Lucene. And while I’m perhaps being a bit
defensive of my good pal “Java Lucene”, I think that folks are
getting the wrong impression about your speed-up predictions.

I don’t want to put a damper on your work or enthusiasm at all, just
want to let you know the word of mouth impressions I’m getting from
folks about Ferret vs. Java Lucene.

An interesting aspect of it is that as long as Ferret is fast enough,
even if it was somewhat faster than Java Lucene (which I’m still
cautiously skeptical of, but certainly optimistic), that alone
wouldn’t likely be the criteria for folks switching from Java.

I really like your confidence! Very impressive what you’ve done so
far, and I’m thrilled beyond belief! Thank you for Ferret, and keep
up the great work.

 Erik

On 11/20/05, David B. [email protected] wrote:

already, but I’m still expecting a 50 times speed up when I integrate
the full C version.

Just to clarify this statement, I mean a 50 times speed up on what I
was able to achieve with my class by class C extensions, not 50 times
faster than Apache Lucene. I don’t want to make any claims about the
speed up there until I’ve actually finished the work, but I’m fairly
confident that it will be faster.

Cheers,
Dave

Hi Erik,
Thanks for the feedback. I’ll definitely try and be a bit more careful
what I say in future. Hopefully my results will do the talking. :wink:

Cheers,
Dave