Adding new items to index breaks searches with *

Hi after upgrading to ferret 0.10.1 and bleeding edge aaf i’m getting
some strange behavior. Generally much better stability with new version
of ferret but when i add new items for some reason i can no longer
search with a *. Or rather i can but it returns no results and no
errors. I can search and get results normally on other searches and when
i rebuild the index i can search with * until i add a new item. Has
anyone else experienced this? I use * in my browse items page.

I think i have a fairly standard aaf setup.

Any ideas what might be going on here or what else to investigate or
what else to try?

Regards
Clare

On Tue, Aug 29, 2006 at 02:06:16AM +0200, Clare wrote:

Hi after upgrading to ferret 0.10.1 and bleeding edge aaf i’m getting
some strange behavior. Generally much better stability with new version
of ferret but when i add new items for some reason i can no longer
search with a *. Or rather i can but it returns no results and no
errors. I can search and get results normally on other searches and when
i rebuild the index i can search with * until i add a new item. Has
anyone else experienced this? I use * in my browse items page.

do you mean a query only consisting of ‘’ or wild card queries like
'test
’ ? The former isn’t an allowed query, afaik. Don’t know why it
works before modifying the index. Here’s the snippet how I reproduced
this behavior:

require ‘rubygems’
require ‘ferret’
include Ferret
i = I.new
i << ‘just some testing’
i.search(’’).total_hits # => 1
i << ‘another testing session’
i.search(’
’).total_hits # => 0

why don’t you just use find(:all) on your browse page ?

Jens

webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Hi i could use a find(:all, :conditions => blah) but my browse page is
divited into types and categories and so i was using a wildcard search
with find_by_contents and then one or two filters depending on whether
the user selects a type or a type and category. I just thought that
ferret would be faster than a find all with conditions,(also im already
using it on my search page and the browse page has similar
functionality). Is this not so? The conditions would be an exact match
on the full contents of a db cell. Would ferret still be faster with
this?
So what i basically want to do is a simple search on one or two fields.
How is this done with acts as ferret? How do you specify what fields out
of the index to search on?
Thanks for an advice
regards
Clare

Jens K. wrote:

On Tue, Aug 29, 2006 at 02:06:16AM +0200, Clare wrote:

Hi after upgrading to ferret 0.10.1 and bleeding edge aaf i’m getting
some strange behavior. Generally much better stability with new version
of ferret but when i add new items for some reason i can no longer
search with a *. Or rather i can but it returns no results and no
errors. I can search and get results normally on other searches and when
i rebuild the index i can search with * until i add a new item. Has
anyone else experienced this? I use * in my browse items page.

do you mean a query only consisting of ‘’ or wild card queries like
'test
’ ? The former isn’t an allowed query, afaik. Don’t know why it
works before modifying the index. Here’s the snippet how I reproduced
this behavior:

require ‘rubygems’
require ‘ferret’
include Ferret
i = I.new
i << ‘just some testing’
i.search(’’).total_hits # => 1
i << ‘another testing session’
i.search(’
’).total_hits # => 0

why don’t you just use find(:all) on your browse page ?

Jens

webit! Gesellschaft f�r neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr�mer [email protected]
Schnorrstra�e 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Hi!

On Tue, Aug 29, 2006 at 03:49:10PM +0200, Clare wrote:

Hi i could use a find(:all, :conditions => blah) but my browse page is
divited into types and categories and so i was using a wildcard search
with find_by_contents and then one or two filters depending on whether
the user selects a type or a type and category. I just thought that
ferret would be faster than a find all with conditions,(also im already
using it on my search page and the browse page has similar
functionality). Is this not so? The conditions would be an exact match
on the full contents of a db cell. Would ferret still be faster with
this?

As long as there’s no user-entered search term to find, but just
categories
and types you might be better off by using the db directly.

With a find statement like
find(:all, :conditions => [“category=?”, params[:category]])
speed won’t differ much as long as you have an index in your db on the
category column.

In general, Ferret tends to be faster when it comes to searching longer
texts.

Also keep in mind that acts_as_ferret always fetches records by id from
the db anyway to retrieve the records whose ids it has found through
Ferret.
So aaf can be only faster, if Ferret needs less time for searching than
the time difference between these two statements:
select * from … where id in(…) (what aaf does with the ids it found)
and
select * from … where category=’’ (what you would do when using
find(:all …)

So what i basically want to do is a simple search on one or two fields.
How is this done with acts as ferret? How do you specify what fields out
of the index to search on?

in ypur query, prefix the term with the field name, i.e. “title:test”
will only retrieve records where the term test occurs in the title
field.

cheers,
Jens

do you mean a query only consisting of ‘*’ or wild card queries like
i << ‘another testing session’
D-01069 Dresden Fax +49 351 46766 66


Posted via http://www.ruby-forum.com/.


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
SchnorrstraÃ?e 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

As a sidenote I’d like to mention that ferret tends to be faster as
mysql on
columns / fields with short texts either if there are many, many
datasets. I
for example made this experience on a mysql database with millions of
rows
of short varchars in it’s columns. Mysql, even while optimized by an
experienced DBA, with all necessary indices set and queries that
EXPLAINed
to be optimized had quite some problems in handling lots of queries,
while
ferret made a damn good job on querying the same data. I’ve had a sense
of
achievement because of ferret / lucene with this. That said it is quite
uncommon to have millions of categories so Jens suggestion seems to be
very
reasonable on this point.

Cheers,
Jan

On 8/29/06, Jens K. [email protected] wrote:

‘test*’ ? The former isn’t an allowed query, afaik. Don’t know why it
i.search(’*’).total_hits # => 0

why don’t you just use find(:all) on your browse page ?

Thanks for the snippet Jens. This was a bug (quite a serious one)
which I have now fixed. As Jens said, “" queries were not a good idea
and would fail on most indexes because of the number of terms (the got
expanded as MultiTermQueries with every single term in the index).
However, I’ve now modified the QueryParser to translate "
” to a
MatchAllQuery so there should be no problem, performance or otherwise
with using “*” in your queries.

I should note here that “title:" will match all documents include
documents that don’t have a :title field. If you only want documents
with a :title field you should use "title:?
”. Having said that, if
you are using these types of queries there is probably a better way to
do what you are doing.

Cheers,
Dave

Thanks very much for clarifying that Jens. Much appreciated!
regards
Clare