Adding entry breaks index


#1

Our ferret 0.10.13 index has been slowly growing on our debian server
and has just got up over 14,000 records. Yesterday I randomly noticed
that one search I did was suddenly giving whack, unexpected results. I
have spent much time trying to track the problem.

Tried ferret 0.10.9 - no change.
Tried on a windows machine - where it works fine, and doesn’t give weird
results (which just adds to the strangeness - anyway I need it to work
on the debian server)

narrowed it down to one single entry that when you add or delete from
the index completely changes results in unrelated searches.
a little console output shows this best.

index = Ferret::Index::Index.new(FerretConfig::INDEXOPTIONS)

puts index.search(“westpac”).total_hits
286
puts index.search(“westpac branch”).total_hits
277

doc = Entry.find(1094481).make_entry_ferret_doc
=> {:latitude1d=>“36.9”, :address=>“61 Remuera Rd, Newmarket”,
:longitude1d=>“174.8”, :name=>“Spiro’s Florists”, :precision=>“1
number”, :tags=>“Flowers, bouquets, gift baskets, permanent floral
arrangements, inter-flora”, :zid=>1094481}
index << doc
index.flush
index.optimize

puts index.search(“westpac”).total_hits
286
puts index.search(“westpac branch”).total_hits
3

index.delete(“1094481”)
index.flush
index.optimize

puts index.search(“westpac”).total_hits
286
puts index.search(“westpac branch”).total_hits
277

I’m completely lost on this. It makes no sense to me at all.
Rebuilding the index doesn’t help. It happens the same on 2 similar but
independent debian boxes.

Anyone got any clues as to where to start?
While it’s fine to just remove this entry and presume everything is
working - without knowing why this breaks it’s pretty hard to have faith
in the index not breaking again…

Really appreciate any thoughts,
Sam


#2

On Sat, Feb 10, 2007 at 06:03:47AM +0100, Sam wrote:

Our ferret 0.10.13 index has been slowly growing on our debian server
and has just got up over 14,000 records. Yesterday I randomly noticed
that one search I did was suddenly giving whack, unexpected results. I
have spent much time trying to track the problem.

Tried ferret 0.10.9 - no change.
Tried on a windows machine - where it works fine, and doesn’t give weird
results (which just adds to the strangeness - anyway I need it to work
on the debian server)

could you try Ferret 0.10.14?

286
puts index.search(“westpac branch”).total_hits
3

really strange. To further track this down I’d try with variations of
this record, i.e. leave one field empty, then the other to find out
which field’s value is causing this problem. btw, what number of hits
does
index.search(“branch”).total_hits
yield with/without that record?

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer removed_email_address@domain.invalid
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66


#3

Everything happens the same with 0.10.14

index.search(“branch”).total_hits
is constant at 811 through all tests

I guessed that it was something to do with the tags field, removing it
before adding the doc made everything ok - so I played with changing the
values in the tags field. I narrowed it down to this.

If tags is or contains any of the follwing words
baskets
basket
ba
ball
baloney
basketcase
babaracchus

then the search numbers for
westpac branch
drop from 277 to 3

if tags is any of
b
ba baracchus

then the search numbers for
westpac branch
stay at 277

Looks like even the A-team can’t help me…


#4

Hey Sam,

dave said he is going to look into this in the near future… We’ll
hopefully get some
information about your problem soon.

Ben


#5

Hi Sam,

Do you think it would be possible to send me a copy of the index (if
the data isn’t sensitve)? It would be really helpful as I can’t seem
to reproduce the problem. I’m on Ubuntu here so I should be able to
replicate the problem with the index.

Cheers,
Dave


#6

It’s open source data so no problem there. Index sent off list…


#7

On 2/13/07, Sam removed_email_address@domain.invalid wrote:

It’s open source data so no problem there. Index sent off list…

Thanks Sam, problem fixed. Ben emailed me privately about this bug
suggesting that it might be serious. He was quite correct. When I put
out the fix for this it will require everyone to rebuild their
indexes.

I’m going to add another fix to get rid of the FileNotFound bug that a
lot of people have been getting (yes, I’ve finally found the cause of
this one) and then I’ll put another release out. I was going to make
this change backwards compatible but since their is a bug in the
current index format and everyone will need to rebuild anyway, I guess
it probably isn’t necessary. If anyone can’t rebuild their indexes for
some reason, please let me know and I’ll try and come up with a
solution before I put the next release out.

Once these fixes are out and I’m happy I haven’t introduced any new
bugs I’ll be releasing Ferret 1.0 so look out for it.

Cheers,
Dave


#8

David B. wrote:

Once these fixes are out and I’m happy I haven’t introduced any new
bugs I’ll be releasing Ferret 1.0 so look out for it.

Awesome Dave! Lovely to have you back.