Recently I’ve been revisiting some of my search code. With a greater
understanding of how Java Lucene implements its search methods, I
realized that one level of abstraction is not present in the Ferret
classes/methods. Here are the relevant method signatures:
Ferret’s search methods:
in Ferret::Index::Index:
search(query, options = {}) -> returns a TopDocs
search_each(query, options = {}) {|doc, score| …} -> yields to
context w/ doc and score for each hit
in Ferret::Search::IndexSearcher:
search(query, options = {}) -> returns a TopDocs
search_each(query, filter = nil) {|doc, score| …} -> yields to
context w/ doc and score for each hit
Lucene’s search methods:
in the interface Searchable:
public void search(Query query, Filter filter, HitCollector results)
public TopDocs search(Query query, Filter filter, int n)
public TopFieldDocs search(Query query, Filter filter, int n, Sort sort)
in org.apache.lucene.search.Searcher (which implements Searchable):
public final Hits search(Query query)
public Hits search(Query query, Filter filter)
public Hits search(Query query, Sort sort)
public Hits search(Query query, Filter filter, Sort sort)
I was wondering if there were plans to implement the Hits class in
Ferret. (Or if someone were to write a patch implementing them, would
David integrate it into the source?) It seems like it is a useful
abstraction since TopDocs does not allow you to access its hits by
index, only by the .each() method call.
Some questions:
- Will changing these methods break people’s existing code?
- Where is the proper place to put these methods? Move the methods
that return TopDocs to a module, which is more or less the same as a
Java interface, and implement the methods that return Hits directly in
the class? What is a good way to do this that feels Rubyish and takes
advantage of its strengths and idioms? - The options to limit the search (first_doc and num_doc) in
Search::IndexSearcher and the code that implements them should
probably be moved out of Search::IndexSearcher into Index::Index - Are there lower level issues I am not aware of that would make any
of this a bad idea?
Am I missing something here? Are there reasons not to have Ferret’s
implementation of these methods and classes follow Java Lucene’s as
closely as possible? I’d appreciate hearing your thoughts.
-F