Re: term vector blues

Some more on this issue. I can narrow the crash down to an index with
just one document, and ferret crashes after getting its term vector a
number of times, perhaps as few as 2 or 3. The version below is tuned to
crash quickly on my system, others may find it necessary to give other
numbers in the command line argument in order to make the crash happen
sooner. See below for the code.

Some fiddling with the ferret source code reveals that disabling the
call to tv_destroy() in frt_ir_term_vector() (which is the implmentation
of #term_vector) seems to make the problem go away. Experimenting with
tv_destroy(), I found that disabling just the frees of offsets and
positions is enough to keep the crash away. This suggests that there’s a
mismanagement of the allocation of the associated variables, but if so I
was unable to spot it in the source…

This is further than I’ve gotten in investigating this problem in a
while, but I’m unsure where to go next.

require ‘rubygems’
require ‘ferret’

fields = Ferret::Index::FieldInfos.new
fields.add_field :text, :store => :no

scale=(ARGV.first||662).to_i #rand(1000)

s = {:text => "foo bar baz "*scale }

i = Ferret::I.new :field_infos => fields
i << s

9999999999.times do|j|
tv = i.reader.term_vector(0, :text)
print “.”; STDOUT.flush
end

On 2/21/07, Caleb C. [email protected] wrote:

tv_destroy(), I found that disabling just the frees of offsets and
positions is enough to keep the crash away. This suggests that there’s a
mismanagement of the allocation of the associated variables, but if so I
was unable to spot it in the source…

This is further than I’ve gotten in investigating this problem in a
while, but I’m unsure where to go next.

Hi Caleb,

After reading your first email I found the same things behavior as you
describe here. This is very frustrating because in this case I create
completely independent Ruby objects. They don’t reference the Ferret
data space at all so this was the last place I expected to have
garbage collection problems. It makes no sense to me at all that not
freeing the offsets and positions arrays should make any difference at
all. If you have any more ideas with regard to this problem I’d love
to hear them as it has me a little stumped.

On 2/23/07, David B. [email protected] wrote:

of #term_vector) seems to make the problem go away. Experimenting with
After reading your first email I found the same things behavior as you
describe here. This is very frustrating because in this case I create
completely independent Ruby objects. They don’t reference the Ferret
data space at all so this was the last place I expected to have
garbage collection problems. It makes no sense to me at all that not
freeing the offsets and positions arrays should make any difference at
all. If you have any more ideas with regard to this problem I’d love
to hear them as it has me a little stumped.

I’ve made a little more progress on this. By disabling the garbage
collector while building the term_vector I can prevent the segfault. I
guess I need to spend some time to really understand how the ruby
garbage collector works.

Adding the top and bottom lines bellow prevents any segfault;

    int old_dont_gc = rb_gc_disable();
    rtv = frt_get_tv(tv);
    tv_destroy(tv);
    if (old_dont_gc == Qfalse) rb_gc_enable();