Inconsistent results when using wild card queries


#1

We get some unexpected results when using wild card queries. We’re using
aaf and Ferret 0.11.4

For exampel, when seraching on a part of a collegues name (kristofer)
and limiting it to a specific source_id:

Query: source_id:25 AND kri*
Result: 2 documents. None of them containg the word kristofer, but other
matching words, as “kring” and “kringÃ¥” (swedish)

Query: source_id:25 AND kris*
Result: 0 documents.

Query: source_id:25 AND krist*
Result: 12 document. Works as expected.

The index contains in total about 200 000 documents and I’ve tried
rebuilding and optimizing with no result.

Has anyone else experienced something similar? Any ideas how to fix it?

Thanks!

/David W.


#2

David W. removed_email_address@domain.invalid writes:

Hi,

Has anyone else experienced something similar? Any ideas how to fix it?

Unfortunatly i’ve also experienced that kind of weirdness. And most of
the time it as to do with accentuation.
i’m unable to match a single é if I search for é (while it works
with
wordwithé)If i search for e it highlights single e, but it doesn’t for single
a…

Sorry to say that, but at the moment I’m considering using another
search enigne. (since I also have very weird unresolved issues with
highlighting)
I’m looking at xapian at the moment.


#3

On Thu, Jul 05, 2007 at 11:05:36AM +0200, removed_email_address@domain.invalid wrote:

David W. removed_email_address@domain.invalid writes:

Hi,

Has anyone else experienced something similar? Any ideas how to fix it?

Unfortunatly i’ve also experienced that kind of weirdness. And most of
the time it as to do with accentuation.
i’m unable to match a single é if I search for é (while it works
with wordwithé)

I don’t know if this is acceptable for you in terms of result exactness,
but you might consider replacing accentuated chars with their
ascii-counterparts during analysis.

If i search for e it highlights single e, but it doesn’t for single
a…

wild guess - maybe this is because a is a stopword and e isn’t?
In general highlighting ‘e’ works, as does highlighting ‘a’, as long as
you use an analyzer with empty stopword list:

require ‘ferret’
include Ferret
i = I.new :analyzer => Analysis::StandardAnalyzer.new([])
i << ‘A tree in the woods’
i << ‘Some sentence with e’
i.highlight ‘a’, 0, :field => :id

=> [“A tree in the woods”]

i.highlight ‘e’, 1, :field => :id

=> [“Some sentence with e”]

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
removed_email_address@domain.invalid | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa


#4

Jens K. removed_email_address@domain.invalid writes:

Has anyone else experienced something similar? Any ideas how to fix it?

Unfortunatly i’ve also experienced that kind of weirdness. And most of
the time it as to do with accentuation.
i’m unable to match a single é if I search for é (while it works
with wordwithé)

I don’t know if this is acceptable for you in terms of result exactness,
but you might consider replacing accentuated chars with their
ascii-counterparts during analysis.

Thanks for your quick answers Jens.
It could be acceptable, but the highlighting problems I’ve discovered
are stopping me from doing any further development.
Unfortunatly I don’t have time to fix them myself and Dave seems very
busy. :frowning:

sorry if it sounds like whinging :slight_smile:

Cheers


#5

On Thu, Jul 05, 2007 at 12:19:52PM +0200, removed_email_address@domain.invalid wrote:

ascii-counterparts during analysis.

Thanks for your quick answers Jens.
It could be acceptable, but the highlighting problems I’ve discovered
are stopping me from doing any further development.
Unfortunatly I don’t have time to fix them myself and Dave seems very
busy. :frowning:

if you really like to switch, did you consider acts_as_solr? it’s API is
much like aaf’s.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
removed_email_address@domain.invalid | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa


#6

On Thu, Jul 05, 2007 at 04:55:59PM +0200, removed_email_address@domain.invalid wrote:

Jens K. removed_email_address@domain.invalid writes:

if you really like to switch, did you consider acts_as_solr? it’s API is
much like aaf’s.

I certainly would if I was ok to use java. :slight_smile: (but i’m not)

afair you need no Java skills to get Solr running, however you’ll need
some spare server resources, that’s for sure :wink:

at the moment, I’m considering hyperestraier and xapian.
If there were a python api + rails plugin (and also as much features
as ferret) that would be perfect :slight_smile:

Solr has an http interface, so talking to it from python would be no big
deal.

Otherwise you could now, possibly being the first user of xapian in a
rails app,
start your very own acts_as_xapian :wink:

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
removed_email_address@domain.invalid | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa


#7

Jens K. removed_email_address@domain.invalid writes:

if you really like to switch, did you consider acts_as_solr? it’s API is
much like aaf’s.

I certainly would if I was ok to use java. :slight_smile: (but i’m not)
at the moment, I’m considering hyperestraier and xapian.
If there were a python api + rails plugin (and also as much features
as ferret) that would be perfect :slight_smile:
I haven’t really looked/tested yet :slight_smile: