Adding a custom filter to the query

Hi all,
I’m trying to figure out how to add a filter into a search. I’ve
created the filter, basically copying the location filter from
http://blog.tourb.us/archives/ferret-and-location-based-searches. But
when I try to call Index.search and pass the filter in a hash with the
key :filter, I get back that it is expecting type Data, and so I’m at
a loss to figure out what to check next. Any help would be greatly
appreciated. I’m sure I have a lot to learn, but some nudges in the
right direction would be wonderful.


Cheers,
Jordan F.
[email protected]

On 7/15/06, Jordan F. [email protected] wrote:

Hi all,
I’m trying to figure out how to add a filter into a search. I’ve
created the filter, basically copying the location filter from
http://blog.tourb.us/archives/ferret-and-location-based-searches. But
when I try to call Index.search and pass the filter in a hash with the
key :filter, I get back that it is expecting type Data, and so I’m at
a loss to figure out what to check next. Any help would be greatly
appreciated. I’m sure I have a lot to learn, but some nudges in the
right direction would be wonderful.

Hi Jordan,
This is a bug which needs to be fixed. Please wait for the next
version of Ferret. Or you could use the pure ruby version.

Cheers,
Dave

On 7/15/06, David B. [email protected] wrote:

Oh, really…darn, it was kind of important. How do I force it to use
the pure ruby version? How long until the next version? Is it a
complicated fix or is it fixed in a version that I could access (SVN
or something)?


Cheers,
Jordan F.
[email protected]

On 7/15/06, David B. [email protected] wrote:

Dave

PS: The fix can’t be checked out of svn yet. I still have a lot of
work to do. Sorry.


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Don’t apologize man, you’ve done an exceptional job with it so far.
The filter I was trying to add would filter based on location, so I’m
not sure that It could be done easily using a query-filter. It takes a
latitude, longitude, and radius, then filters for records that are
with the radius…think that’s doable with the builtin filters? I
guess I could do it with a bounding box instead, but I’d prefer to
keep it accurate…Anyways, I’ll try the rferret route for now, and
hopefully by the time this application goes to production, the c
version will be fixed up. Thanks for your help.


Cheers,
Jordan F.
[email protected]

On 7/16/06, Jordan F. [email protected] wrote:

Cheers,
keep it accurate…Anyways, I’ll try the rferret route for now, and
hopefully by the time this application goes to production, the c
version will be fixed up. Thanks for your help.

That is a perfect example of what you can’t use the QueryFilter for. I
may even use it as an example in the documentation. Thanks and good
luck with the pure Ruby version.

Cheers,
Dave

On 7/16/06, Jordan F. [email protected] wrote:

http://rubyforge.org/mailman/listinfo/ferret-talk

Oh, really…darn, it was kind of important. How do I force it to use
the pure ruby version? How long until the next version? Is it a
complicated fix or is it fixed in a version that I could access (SVN
or something)?

To force it to use the pure ruby version require ‘rferret’ instead of
‘ferret’. Alternatively (I should have mentioned this the first time)
you can use a QueryFilter. For example;

filter = QueryFilter.new(TermQuery.new(Term.new("subject", 

“sport”)))

You should be able to build pretty much any filter you need just like
that. Hope that helps.

Cheers,
Dave

PS: The fix can’t be checked out of svn yet. I still have a lot of
work to do. Sorry.

On 7/16/06, David B. [email protected] wrote:

That is a perfect example of what you can’t use the QueryFilter for. I
may even use it as an example in the documentation. Thanks and good
luck with the pure Ruby version.

Cheers,
Dave


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

I tried out the pure ruby version, but I’m having a little bit of
trouble wrapping my head around how to write the filter. It seems that
some stuff has changed internally since the sample code that I found
at tourb.us was written. I tried looking at the RangeFilter code, but
it seems to be solving too different a problem to really be useful as
a guide. Do you know of any other filters, or have any pointers to how
I would go about writing this filter? It seems really simple, just
does a calculation on two of the fields, but because it’s not
iterating through terms, the RangeFilter code doesn’t offer me much
help. If you offer some pointers and I manage to get it working, I’d
be happy to send you a copy to use as a sample, though it seems like
the kind of thing you’d probably be able to write in a few minutes…


Cheers,
Jordan F.
[email protected]

On 7/17/06, Jordan F. [email protected] wrote:

at tourb.us was written. I tried looking at the RangeFilter code, but
it seems to be solving too different a problem to really be useful as
a guide. Do you know of any other filters, or have any pointers to how
I would go about writing this filter? It seems really simple, just
does a calculation on two of the fields, but because it’s not
iterating through terms, the RangeFilter code doesn’t offer me much
help. If you offer some pointers and I manage to get it working, I’d
be happy to send you a copy to use as a sample, though it seems like
the kind of thing you’d probably be able to write in a few minutes…

I think I slightly misunderstood your problem the first time around.
To create this filter, you actually have to iterate through every
document in the index. This will take some time but it would be worth
it if the filter gets used many times, since it gets cached. However,
I don’t think this would work for you because I’m guessing the
longitude, latitude and radius change on a query by query basis. This
is not really what the current filters are designed for. Filters
should be common query restrictions that are run over and over again.
For example, a blog may have a month filter for retrieving documents
from a particular month. This is likely to be used over and over again
and RangeFilters are pretty cheap to build.

So the current solution to your problem is to actually post-filter
your query results yourself (ie filter the results once you have them
back). So let’s say you need ten results. You’d do a search for maybe
50 and run through each result checking the distance and discarding
the ones you don’t need. You’d repeat the search until you found
enough documents. Here is a quick and dirty solution (where num_docs
is the number of documents you want in your resultset);

def search(index, query, num_docs, latitude, longitude, radius)
    first_doc = 0
    results = []
    while true
        count = index.search_each(query,
                    :first_doc => first_doc,
                    :num_docs => num_docs*5) do |doc_id, score|
            doc = index[doc_id]
            # test distance and add to resultset if ok
            if ((doc[:latitude] - latitude) ** 2 +
                (doc[:longitude] - longitude) ** 2) < radius ** 2
                results << doc
            end
            break if results.size == num_docs # have enough docs
        end
        break if count < (num_docs * 5) #already scanned all results
        first_doc += num_docs * 5
    end
    return results
end

This gets even messier when you need to page through the results. A
much nicer solution that this would be to add a :filter_proc to the
search methods. Something like this;

within_radius = lambda do |doc|
    return ((doc[:latitude] - latitude) ** 2 +
            (doc[:longitude] - longitude) ** 2) < (radius ** 2)
end

index.search_each(query, :filter_proc => within_radius) {|d, s| ...}

Does this sound like a good idea? If so I could add it to a future
version of Ferret. Please let me know if you can think of a better way
to do this.

Cheers,
Dave