Forum: Ferret cFerret nearing completion

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2006-03-14 04:59
(Received via mailing list)
Hey folks,

Some good news. I've finished cFerret and it's ruby bindings to the
point where I can run all of the unit tests. I still have to work out
how I'm going to package and release it but it shouldn't be long now.
If you can't wait you might like to try it from the subversion
repository. It'll probably only work on linux at the moment and you
might have to mess with the make file a little.

As for performance, cFerret seems to be about 10 to 20 times faster so
all this work has been worth it.

Cheers,
Dave
E48d29dc8fedb2878fa518d41cc63d88?d=identicon&s=25 Jan Prill (Guest)
on 2006-03-16 20:21
Hey David,

this is great news and your work is greatly appreciated. I've had some
issues on gentoo compiling cferret and therefore added a ticket (and a
small patch for the makefile) on ferret.davebalmain.com. Just thought I
should add this here if someone is searching for gentoo on
ruby-forum.com... and show my appreciation along the lines. I'm really
looking forward to do searching on ruby with lucene-syntax as fast (or
nearly as fast) as on java...

Best Regards
Jan Prill

David Balmain wrote:
> Hey folks,
>
> Some good news. I've finished cFerret and it's ruby bindings to the
> point where I can run all of the unit tests. I still have to work out
> how I'm going to package and release it but it shouldn't be long now.
> If you can't wait you might like to try it from the subversion
> repository. It'll probably only work on linux at the moment and you
> might have to mess with the make file a little.
>
> As for performance, cFerret seems to be about 10 to 20 times faster so
> all this work has been worth it.
>
> Cheers,
> Dave
Ad7805c9fcc1f13efc6ed11251a6c4d2?d=identicon&s=25 Alex Young (Guest)
on 2006-03-16 20:49
(Received via mailing list)
David Balmain wrote:
> Hey folks,
>
> Some good news. I've finished cFerret and it's ruby bindings to the
> point where I can run all of the unit tests.
Yay!

However, I just checked it out, and a 'make' in the project root gives:

>   1) Failure:
> test_new_binary_field(FieldTest) [./ruby/test/unit/../unit/document/tc_field.rb:96]:
> <"stored/uncompressed,binary,<name:{bin_data}>"> expected but was
> <"stored/uncompressed,binary,<name:=bin_data=>">.

Is this something you're expecting, or do you want platform details?
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2006-03-19 04:55
(Received via mailing list)
On 3/17/06, Alex Young <alex@blackkettle.org> wrote:
> > test_new_binary_field(FieldTest) [./ruby/test/unit/../unit/document/tc_field.rb:96]:
> > <"stored/uncompressed,binary,<name:{bin_data}>"> expected but was
> > <"stored/uncompressed,binary,<name:=bin_data=>">.
>
> Is this something you're expecting, or do you want platform details?

Sorry, I'm in the middle of moving the ruby bindings from the cferret
repository to the ferret repository. cferret will just contain cferret
so this problem will be fixed shortly.
Ad7805c9fcc1f13efc6ed11251a6c4d2?d=identicon&s=25 Alex Young (Guest)
on 2006-03-19 12:17
(Received via mailing list)
David Balmain wrote:
> repository to the ferret repository. cferret will just contain cferret
> so this problem will be fixed shortly.
Ah - no worries.  I just wasn't sure if you wanted field test data yet
:-)
79c7b24af1a3b6d489e3150b6b5bf516?d=identicon&s=25 Josh D. (josh-nug)
on 2006-03-27 19:22
Hi David,

thank you for the new release.

To the details. How do I compile the latest svn checkout? If I do the
normal procedure "setup.rb config && setub.rb setup && setup.rb
install", it won't compile because files are missing in ext/.

By trying out I found that "rake package" copies those files to ext/.
But now "setup.rb setup" gives 3 compile errors for except.c. First one
is:

except.c:8: error: `THREAD_ONCE_INIT' undeclared here (not in a
function)

I found that constant in no file in the checkout. Where is it?


On the other hand building the latest release from
http://ferret.davebalmain.com/trac/wiki/DownloadStable works.

So I played around with that one and have added/changed some things on
which I would like to hear your opinion.

* I made QueryParser's "clean_string" callable via Ruby. So that one can
override the method. For that it must be called in frt_qp_parse() via
rb_funcall(). Problem is: qp_parse() is also directly called from C
(index_get_query), so in this case "clean_string" will not be called.

* The current StandardAnalyzer does not parse UTF-8 strings correctly.
So I made a quick hack and copy-and-pasted your old SA-implementation
with Regular Expression to C. Is this of interest? I then would add the
stuff I didnt need (handling of acronyms) and send you the diffs.

* I needed .reader on IndexSearcher . This should be in the main branch
too, right?


Best regards

josh
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 David Balmain (Guest)
on 2006-03-28 03:02
(Received via mailing list)
On 3/28/06, Josh Di <josh.nug@gmail.com> wrote:
> Hi David,
>
> thank you for the new release.
>
> To the details. How do I compile the latest svn checkout? If I do the
> normal procedure "setup.rb config && setub.rb setup && setup.rb
> install", it won't compile because files are missing in ext/.

Sorry, I forgot to check some files in. It should work now. Run

rake ext

to copy all the files into the ext directory and build the extension.
Then setup.rb should work correctly.

> <snip>
> So I played around with that one and have added/changed some things on
> which I would like to hear your opinion.
>
> * I made QueryParser's "clean_string" callable via Ruby. So that one can
> override the method. For that it must be called in frt_qp_parse() via
> rb_funcall(). Problem is: qp_parse() is also directly called from C
> (index_get_query), so in this case "clean_string" will not be called.

I've added a :clean_string attribute to Index and QueryParser. So;

    index = Index::Index.new(:clean_string => false)

will create Index that uses a QueryParser that doesn't call the
clean_string function. This way you can clean the string yourself
before you even pass it to the search method. I think this makes the
most sense.

> * The current StandardAnalyzer does not parse UTF-8 strings correctly.
> So I made a quick hack and copy-and-pasted your old SA-implementation
> with Regular Expression to C. Is this of interest? I then would add the
> stuff I didnt need (handling of acronyms) and send you the diffs.

I'd definitely like to see this. Send me a patch or the code or
whatever.

> * I needed .reader on IndexSearcher . This should be in the main branch
> too, right?

I've added a reader attribute to IndexSearcher. You may like to look
at what I changed and compare it to the way you did it. The memory
management between C and ruby can be quite confusing.
This topic is locked and can not be replied to.