On 7/17/06, Jordan F. [email protected] wrote:
at tourb.us was written. I tried looking at the RangeFilter code, but
it seems to be solving too different a problem to really be useful as
a guide. Do you know of any other filters, or have any pointers to how
I would go about writing this filter? It seems really simple, just
does a calculation on two of the fields, but because it’s not
iterating through terms, the RangeFilter code doesn’t offer me much
help. If you offer some pointers and I manage to get it working, I’d
be happy to send you a copy to use as a sample, though it seems like
the kind of thing you’d probably be able to write in a few minutes…
I think I slightly misunderstood your problem the first time around.
To create this filter, you actually have to iterate through every
document in the index. This will take some time but it would be worth
it if the filter gets used many times, since it gets cached. However,
I don’t think this would work for you because I’m guessing the
longitude, latitude and radius change on a query by query basis. This
is not really what the current filters are designed for. Filters
should be common query restrictions that are run over and over again.
For example, a blog may have a month filter for retrieving documents
from a particular month. This is likely to be used over and over again
and RangeFilters are pretty cheap to build.
So the current solution to your problem is to actually post-filter
your query results yourself (ie filter the results once you have them
back). So let’s say you need ten results. You’d do a search for maybe
50 and run through each result checking the distance and discarding
the ones you don’t need. You’d repeat the search until you found
enough documents. Here is a quick and dirty solution (where num_docs
is the number of documents you want in your resultset);
def search(index, query, num_docs, latitude, longitude, radius)
first_doc = 0
results = []
while true
count = index.search_each(query,
:first_doc => first_doc,
:num_docs => num_docs*5) do |doc_id, score|
doc = index[doc_id]
# test distance and add to resultset if ok
if ((doc[:latitude] - latitude) ** 2 +
(doc[:longitude] - longitude) ** 2) < radius ** 2
results << doc
end
break if results.size == num_docs # have enough docs
end
break if count < (num_docs * 5) #already scanned all results
first_doc += num_docs * 5
end
return results
end
This gets even messier when you need to page through the results. A
much nicer solution that this would be to add a :filter_proc to the
search methods. Something like this;
within_radius = lambda do |doc|
return ((doc[:latitude] - latitude) ** 2 +
(doc[:longitude] - longitude) ** 2) < (radius ** 2)
end
index.search_each(query, :filter_proc => within_radius) {|d, s| ...}
Does this sound like a good idea? If so I could add it to a future
version of Ferret. Please let me know if you can think of a better way
to do this.
Cheers,
Dave