[ANN] Ferret 0.10.13 released

david · October 20, 2006, 3:48pm

Hi Folks,

I’ve just release Ferret 0.10.13 (skip 0.10.12, it was a bad build).
There are two interesting additions to this release. You can now
access the Filter#bits method of the built in filters so you can can
use them in your own code, possibly within your own custom filters.
For example you could implement a custom filter like so:

class MultiFilter < Hash
  def bits(index_reader)
    bit_vector = Ferret::Utils::BitVector.new.not!
    filters = self.values
    filters.each {|filter|

bit_vector.and!(filter.bits(index_reader))}
bit_vector
end
end

And you would use it like this:

mf = MultiFilter.new
mf[:category] = category_filter
mf[:country] = country_filter

# run the query with the filter
index.search(query, :filter => mf)

# filters can be changed and deleted
mf[:category] = new_category_filter
mf.delete(:country)
index.search(query, :filter => mf)

The other major addition is a MappingFilter (< TokenFilter). This can
be used to transform your code from UTF-8 to ascii for example. I
posted an example of how to do this earlier today. However, using the
mapping filter you can apply a list of mappings string mappings rather
than just character mappings. Obviously you could acheive this with a
list of "String#gsub!"s but MappingFilter will compile the mappings
into a DFA so it will be a lot faster. Here is an example:

include Ferret::Analysis
class EuropeanAnalyzer
MAPPING = {
[‘Å’, ‘Ä’, ‘À’, ‘A’, ‘Â’, ‘å’, ‘ä’, ‘à’, ‘â’, ‘a’] => ‘a’,
[‘Ö’, ‘Ô’, ‘ô’, ‘ö’] => ‘o’,
[‘É’, ‘È’, ‘Ê’, ‘Ë’, ‘é’, ‘è’, ‘ê’, ‘ë’] => ‘e’,
[‘Ü’, ‘ü’, ‘ù’] => ‘u’,
[‘ç’] => ‘c’
}
def token_stream(field, string)
return MappingFilter.new(StandardTokenizer.new(string), MAPPING)
end
end

Happy Ferreting and check the Ferret homepage[1] if you are able to
contribute.

Cheers,
Dave

[1] http://ferret.davebalmain.com/trac/