Problem importing lots of records

Pedro_SSISO-8859-1 · August 1, 2006, 1:32pm

I run a script that imports a few thousand records into the database.
The script runs once for each of several XML files. What it does is
parse the XML and for each element of a certain type creates a record in
a rails database that gets indexed with acts_as_ferret. This worked fine
before but today after a few files (70000 records) this started to
happen for any file I tried:

./imports/…/config/…/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in
`<<’: : Error occured at <index_rw.c>:703 (StandardError)
Error: exception 6 not handled: Could not obtain write lock when trying
to write index

from
./imports/…/config/…/vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:510:in
`ferret_create’

from
/usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:333:in
`callback’

from
/usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:330:in
`callback’

from
/usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:262:in
`create_without_timestamps’

from
/usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/timestamp.rb:30:in
`create’

from
/usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1718:in
`create_or_update_without_callbacks’

from
/usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/callbacks.rb:249:in
`create_or_update’

from
/usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1392:in
save_without_validation' ... 24 levels... from /usr/lib/ruby/1.8/rexml/document.rb:173:inparse_stream’
from /home/pedrocr/xmlcodec/lib/stream_parser.rb:74:in `parse’
from ./imports/import-ead:118
from ./imports/import-ead:115

The index is now in this state. There is no process running that could
be holding a lock. Seems like some kind of race condition. I can provide
the index (12MB) if it would help debug this.

The only thing that I can see that might have changed behaviour is that
I added some untokenized fields to sort by.

Any ideas what it might be?

Pedro.

Pedro_SSISO-8859-1 · August 1, 2006, 2:11pm

On 8/1/06, Pedro Côrte-Real [email protected] wrote:

from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.3/lib/active_record/base.rb:1718:in `create_or_update_without_callbacks’

The index is now in this state. There is no process running that could
be holding a lock. Seems like some kind of race condition. I can provide
the index (12MB) if it would help debug this.

The only thing that I can see that might have changed behaviour is that
I added some untokenized fields to sort by.

Any ideas what it might be?

This error occurs when there is a lock file open in the index. The
first thing to do is make sure you don’t have any Index, IndexWriter,
IndexReader or Searcher open on the index. These are the classes that
lock the index. If this is already the case you have a stray lock file
which you can just delete. Look in your index for .lck files. They can
be deleted. Also note that if this probably means that some of the
data that appeared to be added correctly to the index may not have
been. The index should still be valid otherwise.

Dave

Pedro_SSISO-8859-1 · August 1, 2006, 2:53pm

On Tue, 2006-08-01 at 21:08 +0900, David B. wrote:

The only thing that I can see that might have changed behaviour is that
I added some untokenized fields to sort by.

Any ideas what it might be?

This error occurs when there is a lock file open in the index. The
first thing to do is make sure you don’t have any Index, IndexWriter,
IndexReader or Searcher open on the index. These are the classes that
lock the index.

The only thing accessing the index is acts_as_ferret and that should
release it when ruby finishes, right? I’m not running it on a webserver,
it’s running in a script that does the same as the rails script/console.
Could there be some race condition that makes it sometimes not finish?

If this is already the case you have a stray lock file
which you can just delete. Look in your index for .lck files. They can
be deleted. Also note that if this probably means that some of the
data that appeared to be added correctly to the index may not have
been. The index should still be valid otherwise.

Have to try to understand what’s happening then. It seems to be
constantly reproducible.

Pedro.

Pedro_SSISO-8859-1 · August 1, 2006, 3:26pm

On 8/1/06, Pedro Côrte-Real [email protected] wrote:

The only thing accessing the index is acts_as_ferret and that should
release it when ruby finishes, right? I’m not running it on a webserver,
it’s running in a script that does the same as the rails script/console.
Could there be some race condition that makes it sometimes not finish?

No, I don’t think so. The only thing I can think of is it could be a
reference counting error in the Ruby-Ferret bindings.

If this is already the case you have a stray lock file
which you can just delete. Look in your index for .lck files. They can
be deleted. Also note that if this probably means that some of the
data that appeared to be added correctly to the index may not have
been. The index should still be valid otherwise.

Have to try to understand what’s happening then. It seems to be
constantly reproducible.

If you can distill the error to a smaller unit test or something that
I can can reproduce here that would be great. It shouldn’t take me
long to find the problem.

Cheers,
Dave

Pedro_SSISO-8859-1 · August 1, 2006, 4:03pm

On Tue, 2006-08-01 at 22:22 +0900, David B. wrote:

Have to try to understand what’s happening then. It seems to be
constantly reproducible.

If you can distill the error to a smaller unit test or something that
I can can reproduce here that would be great. It shouldn’t take me
long to find the problem.

I’m trying to do that. The last time glibc threw a double free
corruption warning. I’m trying to reproduce it more exactly.

Pedro.

Pedro_SSISO-8859-1 · August 1, 2006, 6:09pm

On Tue, 2006-08-01 at 15:02 +0100, Pedro CÃ´rte-Real wrote:

I’m trying to do that. The last time glibc threw a double free
corruption warning. I’m trying to reproduce it more exactly.

Still haven’t been able to do it completely but here’s the exact error:

*** glibc detected *** double free or corruption (!prev): 0x0847e5b8 ***

Don’t know if it’s possible to go from that hex value to the code line.
Probably not.

Pedro.

Pedro_SSISO-8859-1 · August 1, 2006, 6:16pm

On Tue, 2006-08-01 at 17:06 +0100, Pedro CÃ´rte-Real wrote:

On Tue, 2006-08-01 at 15:02 +0100, Pedro CÃ´rte-Real wrote:

I’m trying to do that. The last time glibc threw a double free
corruption warning. I’m trying to reproduce it more exactly.

Still haven’t been able to do it completely but here’s the exact error:

*** glibc detected *** double free or corruption (!prev): 0x0847e5b8 ***

Don’t know if it’s possible to go from that hex value to the code line.
Probably not.

Here’s a backtrace:

#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7cf19a1 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7cf32b9 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7d2587a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7d2bfd4 in malloc_usable_size () from
/lib/tls/i686/cmov/libc.so.6
#5 0xb7d2c34a in free () from /lib/tls/i686/cmov/libc.so.6
#6 0xb73e8cae in tb_destroy (tb=0x0) at term.c:91
#7 0xb73e8ffc in te_destroy (te=0x49) at term.c:214
#8 0xb73e9345 in ste_close (te=0x87444c0) at term.c:326
#9 0xb73e1b2b in smi_destroy (smi=0x8a7b5f8) at index_rw.c:1984
#10 0xb73e3661 in sm_merge_term_infos (sm=0x8a66588) at index_rw.c:2282
#11 0xb73e3827 in sm_merge_terms (sm=0x8a66588) at index_rw.c:2314
#12 0xb73e3fc2 in sm_merge (sm=0x8a66588) at index_rw.c:2384
#13 0xb73e406c in iw_merge_segments_with_max (iw=0x8396400,
min_segment=18,
max_segment=20) at index_rw.c:829
#14 0xb73e42a7 in iw_merge_segments (iw=0x6, min_segment=0) at
index_rw.c:860
#15 0xb73e4311 in iw_flush_ram_segments (iw=0x8396400) at index_rw.c:918
#16 0xb73e4376 in iw_close (iw=0x8396400) at index_rw.c:965
#17 0xb73f6c2b in index_add_doc (self=0x8ad5278, doc=0x86e63d8) at
ind.c:218
#18 0xb73d4a09 in frt_ind_add_doc (argc=1, argv=0x0, self=3075201096)
at r_search.c:1900
#19 0xb7e7e5d3 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8
#20 0xb7e89089 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8
#21 0xb7e89aef in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8
#22 0xb7e8675b in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8

I seem to be able to consistently generate this. What else should I do?

Pedro.

Pedro_SSISO-8859-1 · August 2, 2006, 1:44am

On 8/2/06, Pedro Côrte-Real [email protected] wrote:

Probably not.
#7 0xb73e8ffc in te_destroy (te=0x49) at term.c:214
#17 0xb73f6c2b in index_add_doc (self=0x8ad5278, doc=0x86e63d8) at ind.c:218
#18 0xb73d4a09 in frt_ind_add_doc (argc=1, argv=0x0, self=3075201096)
at r_search.c:1900
#19 0xb7e7e5d3 in rb_iterator_p () from /usr/lib/libruby1.8.so.1.8
#20 0xb7e89089 in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8
#21 0xb7e89aef in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8
#22 0xb7e8675b in rb_thread_trap_eval () from /usr/lib/libruby1.8.so.1.8

I seem to be able to consistently generate this. What else should I do?

This would usually be enough but there seems to be some weird stuff
happening there. For example, argv = NULL on line #18 which doesn’t
seem possible to me. On line #14 iw gets a strange value but then it
gets restored on line #13. Anyway, I think the error is occurring
higher up on the stack so maybe it doesn’t matter. I’ll keep trying to
work out what is going on but it’d be a lot easier if you could give
me something I could run here to reproduce the error.

Cheers,
Dave

Pedro_SSISO-8859-1 · August 3, 2006, 12:22pm

On Wed, 2006-08-02 at 08:41 +0900, David B. wrote:

This would usually be enough but there seems to be some weird stuff
happening there. For example, argv = NULL on line #18 which doesn’t
seem possible to me. On line #14 iw gets a strange value but then it
gets restored on line #13. Anyway, I think the error is occurring
higher up on the stack so maybe it doesn’t matter. I’ll keep trying to
work out what is going on but it’d be a lot easier if you could give
me something I could run here to reproduce the error.

I have a standalone test case that blows up predictably every time.
Since it involves some internal data I have to clear it with the powers
that be. Shouldn’t be a problem. As soon as that’s done I’ll mail it
privately to you. Shouldn’t take too long.

Meanwhile here’s a different backtrace:

#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7d469a1 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d482b9 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7d7a87a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7d80fd4 in malloc_usable_size ()
from /lib/tls/i686/cmov/libc.so.6
#5 0xb7d8134a in free () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7b8dcae in tb_destroy (tb=0x0) at term.c:91
#7 0xb7b8dffc in te_destroy (te=0x49) at term.c:214
#8 0xb7b8e345 in ste_close (te=0x821b978) at term.c:326
#9 0xb7b86b2b in smi_destroy (smi=0x823e670) at index_rw.c:1984
#10 0xb7b88661 in sm_merge_term_infos (sm=0x80d2ba0) at index_rw.c:2282
#11 0xb7b88827 in sm_merge_terms (sm=0x80d2ba0) at index_rw.c:2314
#12 0xb7b88fc2 in sm_merge (sm=0x80d2ba0) at index_rw.c:2384
#13 0xb7b8906c in iw_merge_segments_with_max (iw=0x80a8450,
min_segment=12,
max_segment=17) at index_rw.c:829
#14 0xb7b892a7 in iw_merge_segments (iw=0x6, min_segment=0) at
index_rw.c:860
#15 0xb7b89311 in iw_flush_ram_segments (iw=0x80a8450) at index_rw.c:918
#16 0xb7b89376 in iw_close (iw=0x80a8450) at index_rw.c:965
#17 0xb7b9ab93 in index_destroy (self=0x80fd6c0) at ind.c:90
#18 0xb7b7679b in frt_ind_free (p=0x80fd6c0) at r_search.c:1677
#19 0xb7ef08de in rb_gc_call_finalizer_at_exit ()
from /usr/lib/libruby1.8.so.1.8
#20 0xb7ed28be in is_ruby_native_thread ()
from /usr/lib/libruby1.8.so.1.8
#21 0xb7eea0d3 in ruby_cleanup () from /usr/lib/libruby1.8.so.1.8
#22 0xb7eea1b3 in ruby_stop () from /usr/lib/libruby1.8.so.1.8
#23 0xb7eea84e in ruby_run () from /usr/lib/libruby1.8.so.1.8
#24 0x080485dc in main ()

Looks much cleaner since it’s actually a full one while the other was
full of ruby’s thread handling that I truncated. Probably because it was
running inside a full rails environment. This time I ripped everything
out and it’s just ferret calls.

Pedro.