Retrieving search result positions

Hi

I’m considering using Ferret in v2 of Weft QDA, a wxruby desktop
application for textual analysis in social science.

Ferret seems a very impressive package that meets and exceeds my
requirements, but I can’t find how to retrieve specific details about
the results.

I’d like to be able to run fairly simple queries. I then need to look at
each term match, and get its document id and the character (not byte)
position at which it occurs in the source document.

My semi-illiterate reading of search.c suggests this is available, but
looking at the SearchHits returned by a SpanTermQuery, they don’t seem
to contain the methods I’m looking for.

Thanks for any help.

alex

On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex F. wrote:

each term match, and get its document id and the character (not byte)
position at which it occurs in the source document.

My semi-illiterate reading of search.c suggests this is available, but
looking at the SearchHits returned by a SpanTermQuery, they don’t seem
to contain the methods I’m looking for.

Without fully understanding what you want to achieve, I guess
TermVectors are what you’re looking for. I’m not sure if they’re working
on characters or bytes, though.

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa

Jens K. wrote:

On Wed, Mar 28, 2007 at 07:30:36PM +0100, Alex F. wrote:

I’d like to be able to run fairly simple queries. I then need to look at
each term match, and get its document id and the character (not byte)
position at which it occurs in the source document.

Without fully understanding what you want to achieve, I guess
TermVectors are what you’re looking for.
Thank you - that class has exactly the data I need. Is there any way to
extract the individual TermVectors implied by a set of search results?

#highlight seems to do this internally, but the only ruby way I’ve found
to access TVs is via index.reader.term_vector(docid_id, :field). I’d
like to be able to find the terms in results of eg a fuzzy or phrase
search.

I’m not sure if they’re working
on characters or bytes, though.

Looks like bytes, but i can probably work round that.

thanks
alex

Jens K. wrote:

end

I’ll give it a try, but if it was a fuzzy match I’m not sure I would
know the exact term that was matched. Similarly with a phrase match -
think I would have to manually verify that a particular occurrence of
one term met the phrase criteria.

thanks
alex

On Thu, Mar 29, 2007 at 07:28:36PM +0100, Alex F. wrote:

extract the individual TermVectors implied by a set of search results?

#highlight seems to do this internally, but the only ruby way I’ve found
to access TVs is via index.reader.term_vector(docid_id, :field). I’d
like to be able to find the terms in results of eg a fuzzy or phrase search.

you will get the doc_ids back from your search, so wouldn’t it work to
just do a search_each and retrieve the term vectors inside the block?

index.search_each(query) do |doc_id, score|
tv = index.reader.term_vector(doc_id, :field)
end

Jens


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa