Index returns all results for specific queries

Hey all,

I’m getting some really weird results when searching documents. It
seems to be somehow related to the document format I’m using.

I wrote a small script to replicate it:


require ‘rubygems’
require ‘ferret’
include Ferret
index = => ‘/tmp/fooindex’, :key => :id)

dummy data

index << {:visibility=>“private”, :type=>“media”, :title=>“example
title”, :owner=>“user/3003”, :author=>“user/3003”,
:description=>“description example”, :id=>“user/3003/media/1”}
index << {:visibility=>“private”, :type=>“media”, :title=>“a new
title”, :owner=>“user/3003”, :author=>“user/3003”, :description=>“more
foo desc”, :id=>“user/3003/media/2”}
index << {:visibility=>“private”, :type=>“media”, :title=>“random
title”, :owner=>“user/3003”, :author=>“user/3003”,
:description=>“random description”, :id=>“user/3003/media/4”}
index << {:visibility=>“private”, :type=>“media”, :title=>“random
title”, :owner=>“user/3003”, :author=>“user/3003”,
:description=>“random description”, :id=>“user/3003/media/5”}

index.search_each(ARGV.shift) { |doc, score|
puts index[doc].load.inspect

The following queries are returning all the results currently in the

$ ruby script.rb “title:me”
{:author=>“user/3003”, :description=>“description example”,
:visibility=>“private”, :id=>“user/3003/media/1”, :title=>“example
title”, :type=>“media”, :owner=>“user/3003”}
… (remaining results)

$ ruby script.rb “title:my”
(same as above)

And weird enough, the following

$ ruby script.rb “title:mo”

Won’t return anything. There’s more variants to that, but I think you
get my meaning.

The following works properly:

$ ruby script.rb “title:random”
(returns the two results that contain “random” in the title, which is
what is supposed to be)

Is there something I’m missing? It doesn’t seem to make sense to me
that those queries above should return all the results in the index,
specially considering they don’t actually match anything.

Any help is appreciated. Thanks.

On 3/13/07, Julio Cesar O. [email protected] wrote:

require ‘rubygems’
foo desc", :id=>“user/3003/media/2”}
Thanks for including the script. It makes my job much easier. :slight_smile:

And weird enough, the following

$ ruby script.rb “title:mo”

Won’t return anything. There’s more variants to that, but I think you
get my meaning.

The problem is that ‘me’ and ‘my’ are stop words. When they get
removed the query becomes ‘title:’ which is invalid. By default Ferret
catches query parse exceptions and attempts to parse the query as a
simple boolean term query, removing all special characters, so this
query then becomes ‘title’. Since title can be found in the title
field for all documents, all documents are returned. So I don’t think
this is a bug but it is definitely undesired behaviour. I’ll try and
think of a better way to parse this.

In the mean time, you may want to think about changing the stopword
list or removing stopwords all together to prevent this problem from

Thanks David,

I instanced a StandardAnalyzer and passed an empty array for stop
words, and it did the trick.

If anyone wants to comment on what I’m losing by doing this, It would
be really nice.