Forum: Ruby Ferret 0.2.1 (port of Apache Lucene to pure ruby)

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 dbalmain.ml (Guest)
on 2005-11-14 07:09
(Received via mailing list)
Hi Folks,

I've just released version 0.2.1. Since my last announcement there
have been quit a few changes, mostly to the Index::Index interface. We
also have a great new logo thanks to Jan Prill. You can check it all
out here;

http://ferret.davebalmain.com/trac/

Dave Balmain

== Description

Ferret is a full port of the Java Lucene searching and indexing
library. It's available as a gem so try it out! To get started quickly
read the quick start at the project homepage;

http://ferret.davebalmain.com/api
http://ferret.davebalmain.com/api/files/TUTORIAL.html

== Changes

=== Multifield searches

You can now do multi field searches using the query parser.

     # search the title and content fields for ruby
    index.search_each("title|content:ruby") {|doc, score| puts
"#{doc}:#{score}"}

     # search all fields for ruby
    index.search_each("*:ruby") {|doc, score| puts "#{doc}:#{score}"}

=== Compound file support and Apache Lucene index reading

You can now store your index in compound files which reduces the
number of files used by the index. This is useful as your index gets
bigger to prevent a too many files open index. It is also handy for
reading Apache Lucene indexes as Apache Lucene uses compound file
format by default.

=== Merging indexes

You can now merge two or more existing indexes into one. The is useful
if you want to have indexers working in parallel to create your index
and then merge all the indexes together create one final index.

    # add indexes 1 to 10 to the final index
    index.add_indexes([index1, index2, ... , index10])

=== Persisting in Memory index.

You can gain a little in performance by using an in memory index for
your indexing and then persisting it to your file system when you are
finished.

    index = Index::Index.new()

    # do all your indexing

    index.persist("/path/to/your/index/directory")

=== Thread safety

Ferret is now threadsafe so feel safe to use it in a multithreaded
environment. Check out the thread tests in the test/functional
directory in the latest distribution.

=== Easy update and delete

You can now use a query to do a delete;

    index.query_delete("content:java or content:perl")

And you can now easily update documents;

    index.update(34, doc)
    index.query_update('author:"David Balmain"', {:author => "Dave
Balmain"})

=== Primary Key

The latest addition is a primary key to the index. Note that this only
works through the Index::Index class and should only be used if you
know what you are doing.

    index = Index::Index.new(:key => ["id", "table"])
    index << {:id => 1123, :table => "product", :product = "Jacket"}
    # ...
    # The following will replace the Jacket product with a t-shirt
    index << {:id => 1123, :table => "product", :product = "T-Shirt"}



Have  fun and let me know what you think.
722a18819725c0f6275b556ced89a3f4?d=identicon&s=25 se (Guest)
on 2005-11-14 13:46
(Received via mailing list)
David Balmain wrote:
> Have  fun and let me know what you think.

Thank you for this awesome library. I just wanted to tell you that you
work
is much appreciated. I don't actually use it right now, but I most
certainly will in the future. Having such a nice and powerful search
engine
is really beneficial for Ruby, too, I think.

Sascha Ebach
C8fbdcf0dad20285e341909b1e1f65c4?d=identicon&s=25 iamkris (Guest)
on 2005-11-14 23:47
(Received via mailing list)
Does it support indexing PDFs, Docs and PPT files? If I remember
correctly this feature is provided in Java Lucene via a project called
Jakarta POI. It is not a big deal since you already started the ball
rolling and someone might add these features in time. Kudos to your
efforts.
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 dbalmain.ml (Guest)
on 2005-11-15 04:27
(Received via mailing list)
Hi Kris,

If you want to index these you'll need to write (or acquire) specific
analyzers for the document type. That's how it works in Lucene too.
One solution may be to index the documents with Lucene and use Ferret
to search the indexes.

Cheers,
Dave
Fe57662c550fb3cce44c398ddf2dd706?d=identicon&s=25 itsme213 (Guest)
on 2005-11-15 16:08
(Received via mailing list)
Any example of (web) search scripts (not necessarily Ruby) that will
work
with the index?

Thanks!

"David Balmain" <dbalmain.ml@gmail.com> wrote in message
news:d792e0dc0511132209o22e6c079m7c1e0b90b29d1a45@mail.gmail.com...
Hi Folks,

I've just released version 0.2.1.
02e7fd03349d2f85a636e6b9153f041e?d=identicon&s=25 jasonallen (Guest)
on 2005-11-15 16:17
(Received via mailing list)
I'm really excited about this library. However, after testing it out
I'm a little puzzled by the behavior. To test it out I added about 20
documents, each containing the same 5 fields (with different field
values in each doc). When I then try to query, the results I get seem
random - that is, they don't always return documents that I'd expect
should be matching. Example:

doc = Document.new
....
doc << Field.new("name", "foobar", Field::Store::NO,
Field::Index::UNTOKENIZED)
....
index << doc

Now when i call search_each("foobar"), i dont see a result (with some i
do, others i don't). However, if I call search_each("foobar~"), then it
seems to reliably return the expected matches. Any tips?


-jay
(running ruby 1.8.2 on OS X 10.3.8)
B5e329ffa0cc78efbfc7ae2d084c149f?d=identicon&s=25 dbalmain.ml (Guest)
on 2005-11-15 18:24
(Received via mailing list)
Hi Jay,

You've got me puzzled too. Would it be possible for you to send me a
full example of this strange behaviour. It's possible that it only
happens on OS X. :( I really need to get my hands on a Mac for a day
because there seems to be a few problems with that environment.
Hopefully we'll have this all sorted out soon.

Thanks,
Dave
This topic is locked and can not be replied to.