Svn problems

I can consistently segfault the 0.10.4 gem, so I’m trying to get the
subversion version working with hopes towards tracking the problem down.

I have a fresh SVN checkout but:

a) the version (in ferret.rb) claims to be 0.9.6; and
b) Ferret::Index::FieldInfos and a couple other classes are missing at
run time. It looks like this is because they’re not exported in the C
extension (although I do see the corresponding C objects in the
code.)

Have I managed to acquire some outdated version of Ferret?

Thanks for any help!

On 9/24/06, William M. [email protected] wrote:

Have I managed to acquire some outdated version of Ferret?

Thanks for any help!

Hi William,

The 0.10.* series was developed in a different subversion repository.
You can check it out from:

$ svn co svn://www.davebalmain.com/exp ferret

If I have time today I might roll it into the original repository. I’m
not sure exactly how I’m going to do it though.

By the way, the 0.10.7 gem is out and it has all changes in it,
including the fix for your TermQuery problem.

Cheers,
Dave

Excerpts from David B.'s mail of 23 Sep 2006 (PDT):

The 0.10.* series was developed in a different subversion repository.
You can check it out from:

$ svn co svn://www.davebalmain.com/exp ferret

Thanks! See patch in following message.

By the way, the 0.10.7 gem is out and it has all changes in it,
including the fix for your TermQuery problem.

Sadly it doesn’t seem to fix the problem, but I’ll spend some more time
playing around now that I have the updated source.

On 9/25/06, William M. [email protected] wrote:

Sadly it doesn’t seem to fix the problem, but I’ll spend some more time
playing around now that I have the updated source.

Hi William,

Did you rebuild the index? You’ll need to do that before it makes any
difference.

Cheers,
Dave

Hi Dave,

Excerpts from David B.'s mail of 24 Sep 2006 (PDT):

Did you rebuild the index? You’ll need to do that before it makes any
difference.

Yes, the original example now works—thanks! Unfortunately, I still see
a lot of queries that return nothing in TermQuery form, but work fine in
String form.

For example:

(0…10).each do |j|
m = @i[j][:message_id]
n1 = @i.search(Ferret::Search::TermQuery.new(:message_id, m)).total_hits
n2 = @i.search(“message_id:#{m}”).total_hits
puts “#{m}: #{n1} #{n2}”
end
[email protected]: 0 1
[email protected]: 1 1
[email protected]: 1 1
[email protected]: 0 1
[email protected]: 0 1
[email protected]: 1 1
[email protected]: 1 1
[email protected]: 0 1
[email protected]: 1 1
[email protected]: 0 1
[email protected]: 0 1

Based on the first and third entries, I can’t imagine this is a
tokenization problem. What do you think?

Hi Dave,

Excerpts from David B.'s mail of 26 Sep 2006 (PDT):

You need to downcase the term when you add it to a TermQuery. The
StandardAnalyzer downcases all text so you need to do the same with
any terms you add to any hand built queries.

Thanks for the response. Downcasing the string passed into the TermQuery
does, in fact, retrieve the document. BUT, I had used a
WhitespaceAnalyzer with no downcasing on that field, so it should have
preserved case in the index.

In fact, some experimentation shows:

mid = “[email protected]
i = Ferret::Index::Index.new
wsa = Ferret::Analysis::WhiteSpaceAnalyzer.new false
wsa.token_stream(:message_id, mid).next
=> token[“[email protected]”:0:26:1]
i.add_document({:message_id => mid}, wsa)
i.search(Ferret::Search::TermQuery.new(:message_id, mid))
=> #
i.search(Ferret::Search::TermQuery.new(:message_id, mid.downcase))
=> #<struct Ferret::Search::TopDocs total_hits=1, hits=[#],
max_score=0.3068528175354>

So it looks like WSA#token_stream does the right thing. Is it possible
isn’t not actually being called at insertion time? Or am I
misunderstanding something?

On 9/27/06, William M. [email protected] wrote:

For example:
[email protected]: 0 1


William [email protected]

Hi William,

You need to downcase the term when you add it to a TermQuery. The
StandardAnalyzer downcases all text so you need to do the same with
any terms you add to any hand built queries.

One way to see what might possibly be wrong is to run the term through
the analyzer yourself.

require 'rubygems'
require 'ferret'

include Ferret::Analysis

EMAILS = [
  "[email protected]",
  "[email protected]",
  "[email protected]",
  "[email protected]",
  "[email protected]",
  "[email protected]",
  "[email protected]",
  "[email protected]",
  "[email protected]",
  "[email protected]",
  "[email protected]"
]
a = StandardAnalyzer.new

EMAILS.each do |email|
  print email + ":"
  tz = a.token_stream(:field, email)
  puts email == tz.next.text
end

Hope that clears things up.

Cheers,
Dave

On 9/28/06, William M. [email protected] wrote:

preserved case in the index.
=> #

i.search(Ferret::Search::TermQuery.new(:message_id, mid.downcase))
=> #<struct Ferret::Search::TopDocs total_hits=1, hits=[#], max_score=0.3068528175354>

So it looks like WSA#token_stream does the right thing. Is it possible
isn’t not actually being called at insertion time? Or am I
misunderstanding something?


William [email protected]

Hi William,

Ok, this is definitely a a bug. I’ve already fixed it and it’ll be out
in the next release. By the way, you probably already know this but
you can set the analyzer used by the index.

Ferret::Index::Index.new(:analyzer => wsa)

You probably have a good reason to be doing it the way you are but I
just wanted to check.

Cheers,
Dave

Excerpts from David B.'s mail of 27 Sep 2006 (PDT):

Ok, this is definitely a a bug. I’ve already fixed it and it’ll be out
in the next release.

Thank you.

By the way, you probably already know this but you can set the
analyzer used by the index.

Ferret::Index::Index.new(:analyzer => wsa)

You probably have a good reason to be doing it the way you are but I
just wanted to check.

Nope, no good reason. Just an incomplete understanding of the API. This
way’s much better.