Ferret 0.11.0-rc1

Hey folks,

Sorry for cross posting like this but this is an important
announcement for all Ferret users.

** Description **

Firstly for those who don’t know, Ferret is a full-text search library
which makes adding search to your application a breeze. It’s much
faster than MySQL full-text search as well most other search libraries
out there. It allows you to do Boolean (+ruby + rails -jewelry) and
phrase queries (“the quick brown fox”) as well as some more unusual
queries like fuzzy queries (misspelling~ matches mispeling or
misspellng), wildcard queries (Aus?ral*), range queries
(date:<=20050601) and a lot more. Ferret also now offers query result
highlighting and excerpting.

** Announcement **

I’ve just released Ferret 0.11.0 which is the first release candidate
for Ferret 1.0. This release has no new features to the API but it
does fix some very major bugs. Ferret now uses lock-less commits which
fixes a problem a lot of people where having with file not found
exceptions. I’ve also fixed a number of bugs which were causing
segfaults (hopefully all of them) so Ferret is now a lot more stable.

==========
!! IMPORTANT !!

Some of these fixes mean that the current version of Ferret is not
backwards compatible. If you install the latest version you will need
to rebuild your index from scratch. Having said that, I do recommend
that everyone upgrade. The new version should be MUCH more stable.

** Try It **

$ sudo gem install ferret

Hi David,

Just did an update to your 0.11.0 release and I am seeing some problems.

I cleared out my index as suggested and ran a query. However, my index
is
not
rebuilding.

The error I am seeing is:

./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:195:
[BUG] Bus Error
ruby 1.8.5 (2006-12-25) [i686-darwin8.8.1]

Abort trap

I can see from the code that this is where it attempts to build the
index.

Is this a bug in the new version or just me?

Keith

On 2/26/07, Keith D. [email protected] wrote:

./script/…/config/…/config/…/vendor/plugins/acts_as_ferret/lib/class_methods.rb:195:
[BUG] Bus Error
ruby 1.8.5 (2006-12-25) [i686-darwin8.8.1]

Abort trap

I can see from the code that this is where it attempts to build the index.

Is this a bug in the new version or just me?

Keith

Hey Keith,
I haven’t tested on a Mac yet. I’ll give it a go now.


Dave Balmain
http://www.davebalmain.com/

On 2/26/07, David B. [email protected] wrote:

Keith

Hey Keith,
I haven’t tested on a Mac yet. I’ll give it a go now.

Ok, I’ve just tried on my brothers Mac and it wouldn’t even install
until I made a simple fix. Are you sure it installed correctly without
any errors? If it did, then there is another problem I need to be
looking for. John L. is also having troubles. Hopefully he can send
me a repeatable test-case so we can sort that one out. It may even be
the same problem that you are having. Anyway, we’ll get this sorted
ASAP.

Dave


Dave Balmain
http://www.davebalmain.com/

Thanks David,

Yes it did install fine using the gem install routine.

If there is anything else I can test for you let me know.

Keith

On 2/26/07, John L. [email protected] wrote:

Hi Dave,

whilst rebuilding my keyed index with 0.11, if I run searches against it
in parallel, the search process still segfaults. If I recreate the
index unkeyed, this does not happen.

gdb --args ruby search_tester.rb

gdb output

Hi John,
Can you possibly send me a repeatable test case? That would help a
bunch. I can’t work out what the problem is from the stack trace I’m
afraid.

Cheers,
dave

Hi Dave,

whilst rebuilding my keyed index with 0.11, if I run searches against it
in parallel, the search process still segfaults. If I recreate the
index unkeyed, this does not happen.

gdb --args ruby search_tester.rb

Search for Iraq returned 0 results
Search for Iraq returned 0 results
Search for Iraq returned 10 results
Search for Iraq returned 10 results

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1211086656 (LWP 3825)]
0xbffd2614 in ?? ()
(gdb) bt
#0 0xbffd2614 in ?? ()
#1 0xb72c6c12 in isea_max_doc (self=0x860f820) at search.c:1008
#2 0xb728803f in tw_new (query=0x8535da0, searcher=0x860f820) at
q_term.c:244
#3 0xb72c6959 in q_weight (self=0x8535da0, searcher=0x860f820) at
search.c:356
#4 0xb7292ced in bw_new (query=0x8535d30, searcher=0x860f820) at
q_boolean.c:1236
#5 0xb72c6959 in q_weight (self=0x8535d30, searcher=0x860f820) at
search.c:356
#6 0xb72c883c in isea_search (self=0x860f820, query=0x8535d30,
first_doc=0, num_docs=10, filter=0x0, sort=0x0, filter_func=0,
load_fields=0) at search.c:1109
#7 0xb729a70c in frt_sea_search_internal (query=0x8535d30,
roptions=3074577560, sea=0x860f820) at r_search.c:2549
#8 0xb729a9c7 in frt_sea_search (argc=2, argv=0xbffd4d90,
self=3074612360) at r_search.c:2592
#9 0xb7ec8a38 in rb_provide () from /usr/lib/libruby1.8.so.1.8
#10 0xb7ecff7e in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#11 0xb7ed0be8 in rb_iter_break () from /usr/lib/libruby1.8.so.1.8
#12 0xb7ed7e20 in rb_apply () from /usr/lib/libruby1.8.so.1.8
#13 0xb7ed7f9a in rb_apply () from /usr/lib/libruby1.8.so.1.8

By keyed, I mean using the :key => :id feature when creating the index,
to avoid duplicate entries.

John.

On Sun, 2007-02-25 at 22:46 +1100, David B. wrote:

Hey folks,

Sorry for cross posting like this but this is an important
announcement for all Ferret users.

Hi Dave,

when rebuilding my unkeyed index and running a search on that index in
parallel, the rebuilding script is bombing out with a File Not Found
exception:

/usr/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27:
/usr/lib/ruby/gems/1.8/gems/ferret-0.11.0/lib/ferret/index.rb:286:in
`add_document’: File Not Found Error occured at <except.c>:93 in xraise
(FileNotFoundError)
Error occured in fs_store.c:329 - fs_open_input
tried to open
“script/…/config/…/ferret_index/development/news_article_versions/_1.fdx”
but it doesn’t exist:

    from 

/usr/lib/ruby/gems/1.8/gems/ferret-0.11.0/lib/ferret/index.rb:286:in
<<' from /usr/lib/ruby/1.8/monitor.rb:229:in synchronize’
from
/usr/lib/ruby/gems/1.8/gems/ferret-0.11.0/lib/ferret/index.rb:256:in
<<' from script/../config/../app/models/news_article_version.rb:119:in ferret_update’
from
script/…/config/…/app/models/news_article_version.rb:90:in
ferret_rebuild' from script/../config/../app/models/news_article_version.rb:90:in ferret_rebuild’
from
/usr/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/base.rb:783:in
benchmark' from /usr/lib/ruby/gems/1.8/gems/activerecord-1.14.4/lib/active_record/base.rb:794:in silence’
… 7 levels…
from
/usr/lib/ruby/gems/1.8/gems/rails-1.1.6/lib/commands/runner.rb:27
from
/usr/local/lib/site_ruby/1.8/rubygems/custom_require.rb:27:in require' from /usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in require’
from script/runner:3

I’m starting the rebuilding process first, then running the search
process. When starting the search process, I often get a Lock Error.
It I start it a couple of times, it gets going.

/usr/lib/ruby/gems/1.8/gems/ferret-0.11.0/lib/ferret/index.rb:125:in
`initialize’: Lock Error occured at <except.c>:117 in xpop_context
(Ferret::Store::lock::LockError)
Error occured in index.c:6044 - iw_open
Couldn’t obtain write lock when opening IndexWriter

    from 

/usr/lib/ruby/gems/1.8/gems/ferret-0.11.0/lib/ferret/index.rb:125:in
initialize' from /usr/lib/ruby/1.8/monitor.rb:229:in synchronize’
from
/usr/lib/ruby/gems/1.8/gems/ferret-0.11.0/lib/ferret/index.rb:122:in
initialize' from config/../app/models/news_article_version.rb:72:in ferret_init_index’
from config/…/app/models/news_article_version.rb:57:in
ferret_index' from config/../app/models/news_article_version.rb:36:in ferret_search’
from /usr/lib/ruby/1.8/benchmark.rb:293:in measure' from config/../app/models/news_article_version.rb:35:in ferret_search’
from search_tester.rb:6
from search_tester.rb:5

I rebuild in a loop in batches of 1000, with no explicit flushes or
commits. If I add a commit after each batch, the File Not Found doesn’t
crop up, but a segfault in the searcher process does (see previous
e-mail I guess :slight_smile:

btw, I’m deleting the index directory before rebuilding.

John.

http://johnleach.co.uk

Hi David,

Appologies for the BCC, wasn’t sure if you were subscribed to
rubyonrails-talk and I didn’t want to cross-post the reply.

Before upgrading, I was running 0.11.0, but it appears as though the
gem update I did just now picked up a newer version (0.11.3). Does
this last one include all of the segfault+filenotfound fixes?

Before upgrading earlier today we hit a couple of bugs, all arising in
the same scenario: one process is creating many ActiveRecord models
which have acts_as_ferret, and so is updating the index, while the
rails application itself (so another process) is doing lots of
find_by_content(), the first of which take a very long time for some
reason (no idea why the finds would themselves cause a re-indexing to
happen, but they appear to).

The first time we ran our scenario, the process doing the creating
barfed due to a wierd lock error:

/usr/local/ruby-1.8.5/lib/ruby/gems/1.8/gems/rails-1.2.2/lib/commands/runner
.rb:45:
/usr/local/ruby-1.8.5/lib/ruby/gems/1.8/gems/ferret-0.11.0/lib/ferret/index.
rb:666:in `initialize’: Lock Error occured at <except.c>:117 in
xpop_context
(Ferret::Store::lock::LockError)
Error occured in index.c:6044 - iw_open

Couldn’t obtain write lock when opening IndexWriter

The second time it’s the rails app itself that failed in the
controller action doing the find_by_contents, and it failed with a
different error: a file not found on a file that it was expecting to
find in the index directory.

I haven’t yet been able to reproduce these problems with 0.11.3 (they
occured with 0.11.0), but I haven’t tried very hard.

I guess my questions are:

1- with the latest version of ferret, 0.11.3, is concurrent access to
the index from multiple processes supposed to work?

2- with the latest version of ferret, 0.11.3, if i add a bunch of
stuff to the index, then do a find_by_contents which initially takes a
long while (because it causes an indexing), then later add more to the
index by creating acts_as_ferret models, should I expect the following
find_by_contents to take a long time, just like the initial one?
Right now it seems to be the case. I’m a little confused as to
where/when the reindexing is supposed to happen, and why it seems
index additions cause a reindexing by the following find_by_contents
call.

-Bosko

On 2/25/07, David B. [email protected] wrote:

out there. It allows you to do Boolean (+ruby + rails -jewelry) and
does fix some very major bugs. Ferret now uses lock-less commits which
to rebuild your index from scratch. Having said that, I do recommend


Bosko M. [email protected]
http://www.crowdedweb.com/