Issues with : in the content

Hi,

I’ve discovered ferret and aaf this evening, I’ve just done some tests
and it seems perfect for my needs.
I’m indexing text data (title, description, etc) and also ethernet
hardware addresses (MAC).

Sorry if that sounds trivial but I can’t find the way to correctly
index and achieve correct searches on MAC addresses.

If I do something like this:

index = Index::Index.new()
index << {:hwaddr => ‘00:11:22:33:44:55’}
index.search_each(’“11:11”’) do |id, score|
puts “Document #{id} found with a score of #{score}”
end

it matches.
if i search ‘11:11’ it also matches.
if the search is ‘0011’ or ‘1122*’ it does not matches
if hwaddr = ‘00z11z22z33z44z55’ it works as expected.
If tried with untokenized index but that didn’t help.

Should I escape : before indexing ? (that’s not convenient)
Should I use another Analyzer ?

Any help would be appreciated.

Thanks in advance.

Hey …

what you should do is to write your own analyzer… that splits
the HWAddress at the : and therefore stores each part of
the MAC address as a separate token… this can be done using
the RegExpAnalyzer … maybe like that:

RegExpAnalyzer.new(/[^:]+/, true) [1]

I would then search via SpanNearQueries [2] to search for certain
MAC parts in a specific order… like that

query = SpanNearQuery.new(:slop => 5, :in_order => true)
query << SpanTermQuery.new(:hwaddr, “11”)
query << SpanTermQuery.new(:fhwaddr, “22”)

this should find all items with 1122

Hope that helps …

Ben

[1] http://ferret.davebalmain.com/api/classes/Ferret/Analysis/
RegExpAnalyzer.html
[2] http://ferret.davebalmain.com/api/classes/Ferret/Search/Spans/
SpanNearQuery.html

Benjamin K. [email protected] writes:

MAC parts in a specific order… like that

query = SpanNearQuery.new(:slop => 5, :in_order => true)
query << SpanTermQuery.new(:hwaddr, “11”)
query << SpanTermQuery.new(:fhwaddr, “22”)

this should find all items with 1122

Hope that helps …

Hey it does. Thanks.
I first thought it was a bug and I would have liked an easier solution.
(for ex: stop the Analyzer to condiser ‘:’ as a stop word )

I don’t think I need to use the RegExpAnalyzer for hwaddr since the
Standard one also cuts on ‘:’.
I’m going to use :slop=>1, :in_order => true
And I’ll try to detect hwaddr search queries to feed SpanNearQuery
accordingly by looking for ‘:’ in the query and see if the word before
‘:’ matches a fieldname. (if it doesn’t and looks like a hwaddr I’ll
feed SpanNearQuery)

Pretty sure that could be done in a nicer way. (don’t hesitate to make
suggestions :))

Also if there’s other ways to index mac addresses without splitting on
: I would be interested to read about them. (especially if I can use
the query without too much processing)

Anyway, Thanks again for the quick answer.

[email protected] writes:

Also if there’s other ways to index mac addresses without splitting on
: I would be interested to read about them. (especially if I can use
the query without too much processing)

Oh in fact what i want to use is the WhiteSpaceAnalyzer for the
field ‘hwaddr’ … (i seems i missed this one before)
:slight_smile: