I successfully installed AAF on my TextDrive OpenSolaris Container, but
I’m having some issues with indexing.
I have a model called Blogs which has AAF enabled.
The first time I tried to find_by_contents for a ‘word’ I know was on
the Database I got now results. Apparently the index was not ready yet.
Then I waited a few hours and checked that the /index directory was
receiving no changes, so the indexing was not happening also.
Then I tried to re-index and I got the following error after a few hours
of work:
Blog.rebuild_index
IOError: IO Error occured at <except.c>:79 in xraise
Error occured in fs_store.c:324 - fs_open_input
couldn’t create InStream
script/…/config/…/config/…/index/production/blog/_73j.fdx:
from
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:273:in delete' from /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:273:in<<’
from /opt/csw/lib/ruby/1.8/monitor.rb:229:in synchronize' from /opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:256:in<<’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:199:in rebuild_index' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:198:inrebuild_index’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:197:in rebuild_index' from /opt/csw/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/connection_adapters/abstract/database_statements.rb:51:intransaction’
from
/opt/csw/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/transactions.rb:91:in transaction' from ./script/../config/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:196:inrebuild_index’
from
./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:194:in
`rebuild_index’
from (irb):9
Again, it seems that the index is incomplete and is bringing partial
results.
Any suggestions on what to do?
PS:. During the indexing, there is nothing being queried on the DB,
actually the unique thing running on that DB was the console where I
runned the rebuild_index.
On Sun, Jan 21, 2007 at 09:32:25PM +0100, Manoel L. wrote:
receiving no changes, so the indexing was not happening also.
Then I tried to re-index and I got the following error after a few hours
of work:
does this mean it took a few hours for rebuilding the index, or did you
only start the rebuild after a few hours?
Blog.rebuild_index
IOError: IO Error occured at <except.c>:79 in xraise
Error occured in fs_store.c:324 - fs_open_input
couldn’t create InStream
script/…/config/…/config/…/index/production/blog/_73j.fdx:
strange. This does really look like the index has been modified by
something else while the rebuild was running. Could you try to start
over with a new, empty index directory?
Jens
–
webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66
Yes, it took a few hours from the start of the rebuild_index and the
failure.
I don’t think that any other process is modifying the index folder,
but I’ll try your suggestion. Cleaning the index folder and running
rebuild_index again.
Thanks for the attention.
Jens K. wrote:
On Sun, Jan 21, 2007 at 09:32:25PM +0100, Manoel L. wrote:
receiving no changes, so the indexing was not happening also.
Then I tried to re-index and I got the following error after a few hours
of work:
does this mean it took a few hours for rebuilding the index, or did you
only start the rebuild after a few hours?
Blog.rebuild_index
IOError: IO Error occured at <except.c>:79 in xraise
Error occured in fs_store.c:324 - fs_open_input
couldn’t create InStream
script/…/config/…/config/…/index/production/blog/_73j.fdx:
strange. This does really look like the index has been modified by
something else while the rebuild was running. Could you try to start
over with a new, empty index directory?
Jens
–
webit! Gesellschaft f�r neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr�mer [email protected]
Schnorrstra�e 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66
Maybe you are correct. Actually my Rails application was UP.
I mean, while I was running Blog.rebuild_index on the console, the Rails
app was running.
Is this the kind of simultaneous modification of the index that you
talked about?
If yes, how will Ferret and Acts-As-Ferret behave in a real life
situation where we have several Mongrels running the Rails application?
Is this a problem?
The Blog.rebuild_index is running, I’ll let you know the results (now
with only the console running).
On Mon, Jan 22, 2007 at 12:25:13PM +0100, Manoel L. wrote:
Jens,
Maybe you are correct. Actually my Rails application was UP.
I mean, while I was running Blog.rebuild_index on the console, the Rails
app was running.
Is this the kind of simultaneous modification of the index that you
talked about?
exactly.
If yes, how will Ferret and Acts-As-Ferret behave in a real life
situation where we have several Mongrels running the Rails application?
Is this a problem?
It should not, since Ferret is supposed to have a file system based
locking that manages inter-process synchronisation.
However it doesn’t seem to be reliable under certain circumstances
the usual workaround is to use a backgroundrb process that does all
the indexing, and only do the searching inside the mongrels.
Unfortunately aaf does not support this kind of remote-indexing yet,
but it is definitely on my list.
The Blog.rebuild_index is running, I’ll let you know the results (now
with only the console running).
Sounds like you index a whole Farm of Blogs - I’m still wondering about
the reason for the long indexing time
cheers,
Jens
–
webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66
On Mon, Jan 22, 2007 at 11:59:06AM +0100, Manoel L. wrote:
Jens, answering your questions:
Yes, it took a few hours from the start of the rebuild_index and the
failure.
wow, either that machine is really slow or you have an enormous amount
of data to index…
or something really weird is going on there.
I don’t think that any other process is modifying the index folder,
but I’ll try your suggestion. Cleaning the index folder and running
rebuild_index again.
let us know how it works out.
Jens
–
webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66
In fact, I’m indexing around 150K blogs, my app is a Blog/Posts indexing
service, just like Technorati, but focused on the Brazilian blogosphere.
Same error, even with only the console running Blog.rebuild_index, see:
/opt/csw/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27:
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/ind
ex.rb:273:in `delete’: IO Error occured at <except.c>:79 in xraise
(IOError)
Error occured in fs_store.c:324 - fs_open_input
couldn’t create InStream
script/…/config/…/index/production/blog/_3pe.fdx:
from
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:273:in <<' from /opt/csw/lib/ruby/1.8/monitor.rb:229:insynchronize’
from
/opt/csw/lib/ruby/gems/1.8/gems/ferret-0.10.14/lib/ferret/index.rb:256:in <<' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:199:inrebuild_index’
from
./script/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:198:in rebuild_index' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:197:inrebuild_index’
from
/opt/csw/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/connection_adapters/abstract/database_statements.
rb:51:in transaction' from /opt/csw/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/transactions.rb:91:intransaction’
from
./script/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:196:in rebuild_index' from ./script/../config/../vendor/plugins/acts_as_ferret/lib/class_methods.rb:194:inrebuild_index’
from (eval):1
from
/opt/csw/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in eval' from /opt/csw/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27 from /opt/csw/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:inrequire’
from
/opt/csw/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in
`require’
from ./script/runner:3
Suggestions?
Any thing else I can do to gather more debug data?
It should not, since Ferret is supposed to have a file system based
locking that manages inter-process synchronisation.
(a bit OT, but since it was mentioned…)
As in managing simultaneous writes as well?
Reason I’m asking is, I wrote an app a few months ago which is a
networked index that is supposed to handle multiple “clients” writing
to the index at the same time. What I did was to write a class that
queued those requests and dispatched them one at a time, since
otherwise, the server would crash because of Ferret locking issues.
That was around Ferret 0.9.3 or so.
I understand I could flush the index every time I insert something,
but that’s too much of a cost in terms of performance that I can’t
afford…
I still cannot complete the indexing rebuild.
All the times I try it I got the same error (but in different files).
Now I’m totally sure that only a unique process (console) is running.
It should not, since Ferret is supposed to have a file system based
locking that manages inter-process synchronisation.
(a bit OT, but since it was mentioned…)
As in managing simultaneous writes as well?
The locking is supposed to prevent simultaneous writing. Afair Ferret
internally waits some time and then retries the write, throwing an error
if it still doesn’t succeed.
Reason I’m asking is, I wrote an app a few months ago which is a
networked index that is supposed to handle multiple “clients” writing
to the index at the same time. What I did was to write a class that
queued those requests and dispatched them one at a time, since
otherwise, the server would crash because of Ferret locking issues.
That was around Ferret 0.9.3 or so.
I’d still go this route to make sure the index stays sane, especially
with a heavily loaded app.
Jens
–
webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66
Seems that the problem was really councurring process building the index
at the same time. I was not aware that I had some runner process on the
Cron.
Now I’m running Blog.rebuild_index really alone, and no failures until
now.
The crazy thing is, 19 HOURS of CPU already and we are far from ending I
think.
I don’t what a completed index seems to be, but the file names give me
an idea of the progress.
TOP Result:
load averages: 3.25, 4.62, 5.88
14:09:29
73 processes: 71 sleeping, 2 on cpu
CPU states: 19.8% idle, 62.1% user, 18.1% kernel, 0.0% iowait, 0.0%
swap
Memory: 16G real, 2053M free, 7520M swap in use, 21G swap free
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
5737 pocscom 1 59 0 36M 32M sleep 19:38 0.32% runner
Current contents of app/index/production/blog:
[92140-AA:~/web/labs/blogblogs/trunk/index/production/blog] pocscom$ ls
-al
total 349436
drwxr-xr-x 2 pocscom pocscom 34 Jan 24 14:10 ./
drwxr-xr-x 3 pocscom pocscom 3 Jan 23 10:40 …/
-rw------- 1 pocscom pocscom 53M Jan 23 17:25 _8km.cfs
-rw------- 1 pocscom pocscom 45M Jan 24 00:03 _h59.cfs
-rw------- 1 pocscom pocscom 35M Jan 24 06:20 _ppw.cfs
-rw------- 1 pocscom pocscom 2.6M Jan 24 06:57 _qkr.cfs
-rw------- 1 pocscom pocscom 1000K Jan 24 07:34 _rfm.cfs
-rw------- 1 pocscom pocscom 2.2M Jan 24 08:12 _sah.cfs
-rw------- 1 pocscom pocscom 4.5M Jan 24 08:50 _t5c.cfs
-rw------- 1 pocscom pocscom 4.0M Jan 24 09:32 _u07.cfs
-rw------- 1 pocscom pocscom 4.9M Jan 24 10:16 _uv2.cfs
-rw------- 1 pocscom pocscom 4.3M Jan 24 11:27 _vpx.cfs
-rw------- 1 pocscom pocscom 3.3M Jan 24 12:24 _wks.cfs
-rw------- 1 pocscom pocscom 5.2M Jan 24 13:32 _xfn.cfs
-rw------- 1 pocscom pocscom 968K Jan 24 13:38 _xiq.cfs
-rw------- 1 pocscom pocscom 656K Jan 24 13:45 _xlt.cfs
-rw------- 1 pocscom pocscom 226K Jan 24 13:50 _xow.cfs
-rw------- 1 pocscom pocscom 655K Jan 24 13:54 _xrz.cfs
-rw------- 1 pocscom pocscom 457K Jan 24 13:58 _xv2.cfs
-rw------- 1 pocscom pocscom 575K Jan 24 14:03 _xy5.cfs
-rw------- 1 pocscom pocscom 459K Jan 24 14:07 _y18.cfs
-rw------- 1 pocscom pocscom 82K Jan 24 14:07 _y1j.cfs
-rw------- 1 pocscom pocscom 42K Jan 24 14:08 _y1u.cfs
-rw------- 1 pocscom pocscom 42K Jan 24 14:08 _y25.cfs
-rw------- 1 pocscom pocscom 3.6K Jan 24 14:09 _y2g.cfs
-rw------- 1 pocscom pocscom 2.5K Jan 24 14:09 _y2r.cfs
-rw------- 1 pocscom pocscom 121K Jan 24 14:10 _y32.cfs
-rw------- 1 pocscom pocscom 584 Jan 24 14:10 _y33.cfs
-rw------- 1 pocscom pocscom 593 Jan 24 14:10 _y34.cfs
-rw------- 1 pocscom pocscom 94 Jan 24 14:10 _y35.fdt
-rw------- 1 pocscom pocscom 0 Jan 24 14:10 _y35.fdx
-rw------- 1 pocscom pocscom 0 Jan 23 10:40 ferret-write.lck
-rw------- 1 pocscom pocscom 79 Jan 24 14:10 fields
-rw------- 1 pocscom pocscom 195 Jan 24 14:10 segments