cFerret nearing completion

Hey folks,

Some good news. I’ve finished cFerret and it’s ruby bindings to the
point where I can run all of the unit tests. I still have to work out
how I’m going to package and release it but it shouldn’t be long now.
If you can’t wait you might like to try it from the subversion
repository. It’ll probably only work on linux at the moment and you
might have to mess with the make file a little.

As for performance, cFerret seems to be about 10 to 20 times faster so
all this work has been worth it.

Cheers,
Dave

Hey David,

this is great news and your work is greatly appreciated. I’ve had some
issues on gentoo compiling cferret and therefore added a ticket (and a
small patch for the makefile) on ferret.davebalmain.com. Just thought I
should add this here if someone is searching for gentoo on
ruby-forum.com… and show my appreciation along the lines. I’m really
looking forward to do searching on ruby with lucene-syntax as fast (or
nearly as fast) as on java…

Best Regards
Jan P.

David B. wrote:

Hey folks,

Some good news. I’ve finished cFerret and it’s ruby bindings to the
point where I can run all of the unit tests. I still have to work out
how I’m going to package and release it but it shouldn’t be long now.
If you can’t wait you might like to try it from the subversion
repository. It’ll probably only work on linux at the moment and you
might have to mess with the make file a little.

As for performance, cFerret seems to be about 10 to 20 times faster so
all this work has been worth it.

Cheers,
Dave

David B. wrote:

Hey folks,

Some good news. I’ve finished cFerret and it’s ruby bindings to the
point where I can run all of the unit tests.
Yay!

However, I just checked it out, and a ‘make’ in the project root gives:

  1. Failure:
    test_new_binary_field(FieldTest) [./ruby/test/unit/…/unit/document/tc_field.rb:96]:
    <“stored/uncompressed,binary,name:{bin_data}”> expected but was
    <“stored/uncompressed,binary,name:=bin_data=”>.

Is this something you’re expecting, or do you want platform details?

On 3/17/06, Alex Y. [email protected] wrote:

test_new_binary_field(FieldTest) [./ruby/test/unit/…/unit/document/tc_field.rb:96]:
<“stored/uncompressed,binary,name:{bin_data}”> expected but was
<“stored/uncompressed,binary,name:=bin_data=”>.

Is this something you’re expecting, or do you want platform details?

Sorry, I’m in the middle of moving the ruby bindings from the cferret
repository to the ferret repository. cferret will just contain cferret
so this problem will be fixed shortly.

David B. wrote:

repository to the ferret repository. cferret will just contain cferret
so this problem will be fixed shortly.
Ah - no worries. I just wasn’t sure if you wanted field test data yet
:slight_smile:

Hi David,

thank you for the new release.

To the details. How do I compile the latest svn checkout? If I do the
normal procedure “setup.rb config && setub.rb setup && setup.rb
install”, it won’t compile because files are missing in ext/.

By trying out I found that “rake package” copies those files to ext/.
But now “setup.rb setup” gives 3 compile errors for except.c. First one
is:

except.c:8: error: `THREAD_ONCE_INIT’ undeclared here (not in a
function)

I found that constant in no file in the checkout. Where is it?

On the other hand building the latest release from
http://ferret.davebalmain.com/trac/wiki/DownloadStable works.

So I played around with that one and have added/changed some things on
which I would like to hear your opinion.

  • I made QueryParser’s “clean_string” callable via Ruby. So that one can
    override the method. For that it must be called in frt_qp_parse() via
    rb_funcall(). Problem is: qp_parse() is also directly called from C
    (index_get_query), so in this case “clean_string” will not be called.

  • The current StandardAnalyzer does not parse UTF-8 strings correctly.
    So I made a quick hack and copy-and-pasted your old SA-implementation
    with Regular Expression to C. Is this of interest? I then would add the
    stuff I didnt need (handling of acronyms) and send you the diffs.

  • I needed .reader on IndexSearcher . This should be in the main branch
    too, right?

Best regards

josh

On 3/28/06, Josh Di [email protected] wrote:

Hi David,

thank you for the new release.

To the details. How do I compile the latest svn checkout? If I do the
normal procedure “setup.rb config && setub.rb setup && setup.rb
install”, it won’t compile because files are missing in ext/.

Sorry, I forgot to check some files in. It should work now. Run

rake ext

to copy all the files into the ext directory and build the extension.
Then setup.rb should work correctly.

So I played around with that one and have added/changed some things on which I would like to hear your opinion.
  • I made QueryParser’s “clean_string” callable via Ruby. So that one can
    override the method. For that it must be called in frt_qp_parse() via
    rb_funcall(). Problem is: qp_parse() is also directly called from C
    (index_get_query), so in this case “clean_string” will not be called.

I’ve added a :clean_string attribute to Index and QueryParser. So;

index = Index::Index.new(:clean_string => false)

will create Index that uses a QueryParser that doesn’t call the
clean_string function. This way you can clean the string yourself
before you even pass it to the search method. I think this makes the
most sense.

  • The current StandardAnalyzer does not parse UTF-8 strings correctly.
    So I made a quick hack and copy-and-pasted your old SA-implementation
    with Regular Expression to C. Is this of interest? I then would add the
    stuff I didnt need (handling of acronyms) and send you the diffs.

I’d definitely like to see this. Send me a patch or the code or
whatever.

  • I needed .reader on IndexSearcher . This should be in the main branch
    too, right?

I’ve added a reader attribute to IndexSearcher. You may like to look
at what I changed and compare it to the way you did it. The memory
management between C and ruby can be quite confusing.