Odd indexing issue

Hey Dave,
I just contributed $100 to the ferret donation box. My project is
earning no money yet (but hopefully will), for now I hope this helps you
out and covers me for asking stupid questions ;).

To get a distance sorted output, I am passing an array of the id field
from a ferret search through to mysql in a custom select statement.
SELECT … id IN (#{ids.join(",")})
This has been working fine through ferret 0.9. I moved to 0.10 this week
and it has been ok but I’m not sure if I just wasn’t ‘activating’ the
error. It happens on 0.10.6 and on 0.10.7.

Today the sql statement was invalid on a certain query. This turned out
to be because 1 or more of the ids passed into the IN statement were not
numbers but some sort of wierd character sequence like \240\236D\010 or
\350\240\227\010.
I’ve tried deleting the index and rebuilding it. It keeps happening,
although on different items in the index on each rebuild. This happens
on 2 different machines, each Debian sarge. Below is a little console
script with output showing the oddness.

The relevant model code is at the bottom of this post, please let me
know if there’s anything else I can supply.

Sam

--------ruby script/console

Entry.create_ferret_index

index = Ferret::Index::Index.new(FerretConfig::INDEXOPTIONS)

an arbitrary query to return all results from index

index.search_each("", {:limit => 6000}) do |doc, score|
if docindex !~ /^\d
$/ then # show me ids that aren’t numeric
p doc.to_s + " " + docindex = index[doc][:id]
end
end

OUTPUT FROM THE ABOVE 1st TIME
“542 \2102\032”
“2294 0\3075\010”
“4186 \250* \010”
OUTPUT FROM THE ABOVE 2nd TIME
“1762 \260\020\036\010”
“2617 \000\000\000\000”
“2719 0+\010" "3176 p0\010”

---------------from entry.rb

def self.create_ferret_index()

field_infos = Ferret::Index::FieldInfos.new(:store => :no, :index =>
:yes, :term_vector => :no, :boost => 1.0)
field_infos.add_field(:name, :store => :no, :index => :yes,
:term_vector => :with_positions_offsets, :boost => 10.0)
field_infos.add_field(:address, :store => :no, :index => :yes,
:term_vector => :with_positions_offsets, :boost => 1.0)
field_infos.add_field(:tags, :store => :no, :index => :yes,
:term_vector => :with_positions_offsets, :boost => 5.0)
field_infos.add_field(:id, :store => :yes, :index => :untokenized,
:term_vector => :no)

field_infos.create_index(FerretConfig::INDEXPATH)

index = Ferret::Index::Index.new(FerretConfig::INDEXOPTIONS)

batch_size = 1000
Entry.transaction do
0.step(Entry.count, batch_size) do |i|
Entry.find(:all, :limit => batch_size, :offset => i).each do |rec|
index << rec.make_entry_ferret_doc
end
end
end
index.flush
index.optimize
index.close

end

def make_entry_ferret_doc

doc = Ferret::Document.new
doc[:id] = self.id
doc[:name] = self.name
doc[:address] = self.physical_address
doc[:tags] = self.tags

doc

end

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs