Forum: Ruby on Rails Full text search using tsearch2

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
06cc84260bff88e4217a2f8843b03b9d?d=identicon&s=25 VP (Guest)
on 2008-12-02 22:24
(Received via mailing list)
Hi,

I implemented a full text search using postgres > 8.3.
My performance is very good to index (using triggers/after_filter or
crontab time based index update) and to search for something.  With
Postgres i can index only lexemes, other solutions, like sphinx based
or aaf creates the index based in length. So everything bigger than
2/3 chars is indexed. I think to use a lexeme is better to avoid
falses positives in a real world.

My questions are:

- Any good reason to dont use tsearch and choose a aaf/*sphinx
solution ?
- There is any way to make tsearch index the text by length and not by
lexeme (i know, probably this question i should ask in a postgres
forum)

Regards,

Victor
A82ba1167f4d4a8d1de63820e576a87f?d=identicon&s=25 Robby Russell (Guest)
on 2008-12-03 05:35
(Received via mailing list)
On Tue, Dec 2, 2008 at 11:43 AM, VP <pelasaco@gmail.com> wrote:
>
> My questions are:
>
> - Any good reason to dont use tsearch and choose a aaf/*sphinx
> solution ?
> - There is any way to make tsearch index the text by length and not by
> lexeme (i know, probably this question i should ask in a postgres
> forum)

Victor,

As a long-time PostgreSQL advocate... I'd encourage you to look at
using Sphinx and/or UltraSphinx.

Why? Configuration is way less complicated for your development and
production environments. I like knowing that I have the advanced
features of PostgreSQL available to me... but I'll side with working
within Ruby as much as possible on our projects. Sphinx will allow you
to evolve your indexing without needing to muck around the database,
which should ease pains with deployments as well.

If you get to a point where Sphinx is falling over from not being to
handle your load, you can evaluate using Tsearch2, but in the
meantime, I'd just consider it premature optimization without much
benefits int he short-term.

Having said that... I haven't worked with tsearch2 in 3+ years. ;-)

Good luck!

Robby
--
Robby Russell
Chief Evangelist, Partner

PLANET ARGON, LLC
design // development // hosting

http://www.planetargon.com/
http://www.robbyonrails.com/
aim: planetargon

+1 503 445 2457
+1 877 55 ARGON [toll free]
+1 815 642 4068 [fax]
06cc84260bff88e4217a2f8843b03b9d?d=identicon&s=25 VP (Guest)
on 2008-12-04 14:15
(Received via mailing list)
Hey Robby,

Thanks for your feedback,
>
> Victor,
>
> As a long-time PostgreSQL advocate... I'd encourage you to look at
> using Sphinx and/or UltraSphinx.

In fact, I used ultrasphinx, and with an average load, it wasnt
uncommon to have to restart  ultrasphinx daemons. Another point that,
i tried to explain in my question, is the way that the data is
indexed. In ultrasphinx, the document is broken in tokens and
everything bigger then 2 chars is indexed. So in my opinion, it
generates too much "false positives" (sorry guys, i could not find
another word).. there is any way to make *sphinx index lexeme ?

> Why? Configuration is way less complicated for your development and
> production environments. I like knowing that I have the advanced
> features of PostgreSQL available to me... but I'll side with working
> within Ruby as much as possible on our projects. Sphinx will allow you
> to evolve your indexing without needing to muck around the database,
> which should ease pains with deployments as well.

I think when i remove an extra daemon, i'm drying my system
architecture, right ?
I mean, if you dont really spend some hours reading (ultra|think)
sphinx code, it will be always a black box in your system that you
dont understand. So am I loosing something ? maybe the learn courve to
tsearch and all postgres caveats are bigger then *really* learn how
*sphinx based solutions works ?

Best regards,

Victor
2d8132658d56e51f19ace1c68e48b6aa?d=identicon&s=25 Thorsten Mueller (thorsten)
on 2008-12-04 14:25
(Received via mailing list)
> In ultrasphinx, the document is broken in tokens and
> everything bigger then 2 chars is indexed. So in my opinion, it
> generates too much "false positives" (sorry guys, i could not find
> another word).. there is any way to make *sphinx index lexeme ?
>

Sure, you can configure min_infix_len, min_word_len
and whatever in your sphinx configuration.

> I mean, if you dont really spend some hours reading (ultra|think)
> sphinx code, it will be always a black box in your system that you
> dont understand.

Whow, so you read Postgres code before using it? ;-)
Same goes for Ruby, Rails, and 99% of what tools we use.
06cc84260bff88e4217a2f8843b03b9d?d=identicon&s=25 VP (Guest)
on 2008-12-04 14:40
(Received via mailing list)
> Whow, so you read Postgres code before using it? ;-)
> Same goes for Ruby, Rails, and 99% of what tools we use.

lol.. no, i mean at least the contrib/tsearch folder i read! ok the
http://www.postgresql.org/docs/8.3/static/textsearch.html as well, and
too be honest its better documented then the sphinx plugins :-)

Thanks for the tip about ultrasphinx, but i would like to know if its
possible to do opposite, to make it work with lexemes. Like for
example if you look for run, you will find run, runs, ran and running.
All of these words are forms of the same lexeme: RUN

I dont wanna be troll or blame the plugins, i heard a lot of good
things about thinksphinx, not so good things about ultrasphinx and
almost nothing about tsearch, that for me is working fine, except that
some developers here comes with the questions: "if i type "te" it dont
bring me my "test" post and ultrasphinx does". But well in real world
users wont search for "te" and if they search for te, they will find
everything like te,teahupo, tea, teta, tendency, etc, etc, etc.

Regards,

Victor
2d8132658d56e51f19ace1c68e48b6aa?d=identicon&s=25 Thorsten Mueller (thorsten)
on 2008-12-04 15:02
(Received via mailing list)
> Thanks for the tip about ultrasphinx, but i would like to know if its
> possible to do opposite, to make it work with lexemes. Like for
> example if you look for run, you will find run, runs, ran and running.
> All of these words are forms of the same lexeme: RUN

hmm, I don't know about "ran"
but if you enable star search, then ru* wil at least find
run, runs, and running.
This may be possible too, but I can't say for sure, since I didn't
need this kind of search
Maybe have a look at the docs:
http://www.sphinxsearch.com/docs/current.html#searching

>
> I dont wanna be troll or blame the plugins, i heard a lot of good
> things about thinksphinx, not so good things about ultrasphinx and
> almost nothing about tsearch, that for me is working fine, except that
> some developers here comes with the questions: "if i type "te" it dont
> bring me my "test" post and ultrasphinx does". But well in real world
> users wont search for "te" and if they search for te, they will find
> everything like te,teahupo, tea, teta, tendency, etc, etc, etc.
>

lo, I agree. It was hard enough to read through lots of tutorials
and documents. We started with ultrasphinx at some time back and
it worked good enough for our needs. For new projects I would
think about changing that to thinksphinx. Especially if I expect
the index to be combined from several tables.
But for basic search both should do well enough.
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2008-12-04 15:03
(Received via mailing list)
On 4 Dec 2008, at 13:39, VP wrote:

> possible to do opposite, to make it work with lexemes. Like for
>
sphinx can do stemming. as for searching for prefixes, that's optional.

Fred
06cc84260bff88e4217a2f8843b03b9d?d=identicon&s=25 VP (Guest)
on 2008-12-04 16:21
(Received via mailing list)
thanks for the tips. As documentation thinksphinx has nice railscast
about. But geocoding searches for example are not documented until now
(or at least, until last week hehe)

Regards,

Victor

On Dec 4, 3:02 pm, Frederick Cheung <frederick.che...@gmail.com>
This topic is locked and can not be replied to.