IO Errors on deleting documents with Ferret

I have a large index (~6GB, ~1 million docs) that was built by RDig.
I wrote a script to iterate through the index to clear out some
duplicate information to try to reduce the size of the index.

clients.each {|client|
docs = RDig.searcher.search("+supplier_id:#{client.id}")
docs.each {|doc|
data = doc[:data].dup #the contents of the web page
new_results = {}
new_results[:client_id] = client.id
new_results[:data] = data
index.delete doc[:doc_id]
index << new_results
}
}

I’ve run a similar script before with no issues. However today I
received the following error after 30 minutes or so:

/usr/lib/ruby/site_ruby/1.8/ferret/index.rb:726:in `initialize’: IO
Error occured at <except.c>:93 in xraise (IOError)
Error occured in index.c:901 - sis_find_segments_file
Error reading the segment infos. Store listing was

     from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:726:in

ensure_reader_open' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:434:indelete’
from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:8:in
synchrolock' from /usr/lib/ruby/1.8/monitor.rb:229:insynchronize’
from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:8:in
synchrolock' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:428:indelete’

Despite the error the index appear to not be corrupted, so I ran the
script again for fun. The following error occurred after
approximately 20 minutes:

/usr/lib/ruby/site_ruby/1.8/ferret/index.rb:723:in `close’: IO Error
occured at <except.c>:93 in xraise (IOError)
Error occured in fs_store.c:264 - fs_new_output
couldn’t create OutStream /mnt/apps/search/current/…/…/
shared/indexes/final/_a4kx.prx:

     from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:723:in

ensure_reader_open' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:434:indelete’
from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:8:in
synchrolock' from /usr/lib/ruby/1.8/monitor.rb:229:insynchronize’
from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:8:in
synchrolock' from /usr/lib/ruby/site_ruby/1.8/ferret/index.rb:428:indelete’

Here are the contents of the index directory:
[[email protected]]# ls -Al …/…/shared/indexes/final/
total 5628324
-rw------- 1 initiate initiate 5713121647 Jul 31 14:22 _5d3s.cfs
-rw------- 1 root root 115159 Aug 5 12:55 _5d3s_2yyy.del
-rw------- 1 root root 22937900 Aug 5 11:28 _7tgc.cfs
-rw------- 1 root root 11475 Aug 5 12:55 _7tgc_sx7.del
-rw------- 1 root root 2220338 Aug 5 11:38 _820z.cfs
-rw------- 1 root root 2311840 Aug 5 11:47 _8alm.cfs
-rw------- 1 root root 2261887 Aug 5 11:56 _8j69.cfs
-rw------- 1 root root 2089120 Aug 5 12:05 _8rqw.cfs
-rw------- 1 root root 2244470 Aug 5 12:14 _90bj.cfs
-rw------- 1 root root 2249160 Aug 5 12:22 _98w6.cfs
-rw------- 1 root root 2231091 Aug 5 12:31 _9hgt.cfs
-rw------- 1 root root 2244881 Aug 5 12:40 _9q1g.cfs
-rw------- 1 root root 2273703 Aug 5 12:48 _9ym3.cfs
-rw------- 1 root root 235566 Aug 5 12:49 _9zgy.cfs
-rw------- 1 root root 220959 Aug 5 12:50 _a0bt.cfs
-rw------- 1 root root 229074 Aug 5 12:51 _a16o.cfs
-rw------- 1 root root 202310 Aug 5 12:52 _a21j.cfs
-rw------- 1 root root 135823 Aug 5 12:53 _a2we.cfs
-rw------- 1 root root 132935 Aug 5 12:54 _a3r9.cfs
-rw------- 1 root root 14190 Aug 5 12:54 _a3uc.cfs
-rw------- 1 root root 13868 Aug 5 12:54 _a3xf.cfs
-rw------- 1 root root 13758 Aug 5 12:54 _a40i.cfs
-rw------- 1 root root 14912 Aug 5 12:54 _a43l.cfs
-rw------- 1 root root 13750 Aug 5 12:54 _a46o.cfs
-rw------- 1 root root 14170 Aug 5 12:54 _a49r.cfs
-rw------- 1 root root 13764 Aug 5 12:55 _a4cu.cfs
-rw------- 1 root root 13719 Aug 5 12:55 _a4fx.cfs
-rw------- 1 root root 13115 Aug 5 12:55 _a4j0.cfs
-rw------- 1 root root 1826 Aug 5 12:55 _a4jb.cfs
-rw------- 1 root root 1935 Aug 5 12:55 _a4jm.cfs
-rw------- 1 root root 1739 Aug 5 12:55 _a4jx.cfs
-rw------- 1 root root 1865 Aug 5 12:55 _a4k8.cfs
-rw------- 1 root root 2072 Aug 5 12:55 _a4kj.cfs
-rw------- 1 root root 1733 Aug 5 12:55 _a4ku.cfs
-rw------- 1 root root 378 Aug 5 12:55 _a4kv.cfs
-rw------- 1 root root 462 Aug 5 12:55 _a4kw.cfs
-rw------- 1 root root 128 Aug 5 12:55 _a4kx.fdt
-rw------- 1 root root 0 Aug 5 12:55 _a4kx.fdx
-rw------- 1 root root 0 Aug 5 12:55 _a4kx.frq
-rw------- 1 root root 0 Aug 5 12:55 _a4kx.tfx
-rw------- 1 root root 0 Aug 5 12:55 _a4kx.tis
-rw------- 1 root root 0 Aug 5 12:55 _a4kx.tix
-rw------- 1 root root 0 Aug 5 12:55 ferret-write.lck
-rw------- 1 initiate initiate 16 Aug 5 12:55 segments
-rw------- 1 root root 1142 Aug 5 12:55 segments_isfj

Here’s my platform: Linux xenU #1 SMP Thu Nov 30 13:48:50 SAST 2006
i686 athlon i386 GNU/Linux

I’m using ruby 1.8.4 and Ferret 0.11.4, which has been hacked to add
in better large file support.

Does anyone have any idea what’s going on? Many thanks in advance.

Erik

On 2007-08-05, at 7:17 PM, Erik M. wrote:

Error occured in index.c:901 - sis_find_segments_file
Error reading the segment infos. Store listing was

     couldn't create OutStream /mnt/apps/search/current/../../

shared/indexes/final/_a4kx.prx:

Hey …

Both errors might have the same reason - to many open files …
I’ve had similar errors some month ago and raised my
open files to 32k, and didn’t had an error since …

[email protected] ~ $ ulimit -n
32768

Benjamin