Forum: Ferret ferret or not ferret?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
mix (Guest)
on 2007-03-01 21:18
hi, i've to choose a search engine for a medium-big site with a lot of
searches and inserts at the same moment, do you suggest me something?
i'm thinking about ferret, but i read that it has some problems with
this king of "work" :(
Jens K. (Guest)
on 2007-03-02 11:16
(Received via mailing list)
On Thu, Mar 01, 2007 at 08:18:32PM +0100, mix wrote:
> hi, i've to choose a search engine for a medium-big site with a lot of
> searches and inserts at the same moment, do you suggest me something?
> i'm thinking about ferret, but i read that it has some problems with
> this king of "work" :(

Ferret recently had several improvements in this area (see Dave's recent
posts about the recent release candidates).

Even if you still should experience problems with multiple processes
accessing the index you can always set up a simple DRb server doing the
indexing/search work.

Or you can have a look at acts_as_ferret, which has such a server
already built in. Not to mention the fact that acts_as_ferret would make
the integration of Ferret-based full text search into your app a
one-liner
:-)


Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
removed_email_address@domain.invalid | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
David B. (Guest)
on 2007-03-02 11:33
(Received via mailing list)
On 3/2/07, mix <removed_email_address@domain.invalid> wrote:
> hi, i've to choose a search engine for a medium-big site with a lot of
> searches and inserts at the same moment, do you suggest me something?
> i'm thinking about ferret, but i read that it has some problems with
> this king of "work" :(

Ferret is getting better and better at this. The latest version still
has a couple of bugs but the current working version is very stable
with multiple processes accessing the index. I've just stress tested
it with 10 search processes and 1 writer process for 24hours without
any problems. I will definitely have this release out before Monday. I
think the next version would be perfect for what you are talking
about.

solrb is also a good option although it will be a little slower and
you'll have to run java on your server (not that this is a big deal).
mix (Guest)
on 2007-03-02 14:44
> cut

ok :)
another question about ferret, is it possible to do 2 kind of search?
normal (which include the text to search and another field) and advanced
(which has more option to select, part or all of them) ?
Jens K. (Guest)
on 2007-03-02 16:43
(Received via mailing list)
On Fri, Mar 02, 2007 at 01:44:45PM +0100, mix wrote:
> > cut
>
> ok :)
> another question about ferret, is it possible to do 2 kind of search?
> normal (which include the text to search and another field) and advanced
> (which has more option to select, part or all of them) ?

that's no problem at all, you can build very complex and field-specific
queries as well as issuing a simple 'give me all docs where term xyz is
in
any field' query.

Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
removed_email_address@domain.invalid | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
Erik H. (Guest)
on 2007-03-02 17:14
(Received via mailing list)
On Mar 1, 2007, at 2:18 PM, mix wrote:
> hi, i've to choose a search engine for a medium-big site with a lot of
> searches and inserts at the same moment, do you suggest me something?
> i'm thinking about ferret, but i read that it has some problems with
> this king of "work" :(

I was lurking on this thread until Dave mentioned solrb.  First of
all, I *love* Ferret.  Dave is amazing, and the performance is
fantastic.  I have been groping for a Lucene in Ruby for a long time,
even starting to tinker with it at a low-level pure Ruby way myself.

When Solr came along I knew this hit the sweet spot I was looking
for.  It's all the greatness of Java Lucene, which is continually and
rapidly being improved by many folks.  Above and beyond just wrapping
Lucene behind an HTTP interface, it adds a ton of great features on
top: caching, replication, faceting, highlighting, and an incredibly
active community.  My expertise is in Java Lucene, so it felt right
to me.  We've started a project called solr-ruby (used to be named
solrb, but we renamed it to be more readable and pronounceable) which
provides a Ruby API to Solr.  For example (from <http://
wiki.apache.org/solr/solr-ruby>):

   # connect to the solr instance
   conn = Connection.new('http://localhost:8983/solr', :autocommit
=> :on)

   # add a document to the index
   conn.add(:id => 123, :title_text => 'Lucene in Action')

   # update the document
   conn.update(:id => 123, :title_text => 'Solr in Action')

   # print out the first hit in a query for 'action'
   response = conn.query('action')
   print response.hits[0]

   # iterate through all the hits for 'action'
   conn.query('action') do |hit|
     puts hit.inspect
   end

   # delete document by id
   conn.delete(123)

On top of solr-ruby, we've also been building Solr Flare, a Rails-
based front-end that presents a faceted and full-text search
interface, including integration with SIMILE Exhibit and Timeline,
and eventually also having Atom feeds, saved searches, etc.

While I certainly don't want to steal any thunder from Ferret,
because I think it is a great project, I feel compelled on this
thread to bring up what I consider a top-notch alternative to Ferret.

It would be very interesting to run some benchmarks comparing the two
at a few levels:  indexing speed, plain full-text query speed, and
also most important to my work, the speed of generating facet
information along with a query.

  Erik
mix (Guest)
on 2007-03-02 17:20
Jens K. wrote:
> On Fri, Mar 02, 2007 at 01:44:45PM +0100, mix wrote:
>> > cut
>>
>> ok :)
>> another question about ferret, is it possible to do 2 kind of search?
>> normal (which include the text to search and another field) and advanced
>> (which has more option to select, part or all of them) ?
>
> that's no problem at all, you can build very complex and field-specific
> queries as well as issuing a simple 'give me all docs where term xyz is
> in
> any field' query.
>
> Jens
>
> --
> Jens Kr�mer
> webit! Gesellschaft f�r neue Medien mbH
> Schnorrstra�e 76 | 01069 Dresden
> Telefon +49 351 46766-0 | Telefax +49 351 46766-66
> removed_email_address@domain.invalid | www.webit.de
>
> Amtsgericht Dresden | HRB 15422
> GF Sven Haubold, Hagen Malessa


perfect, i think i'll go with ferret and act_as_ferret :)
i've also found this:
http://www.railsenvy.com/2007/2/19/acts-as-ferret-tutorial seems very
good :)
thanks
mix (Guest)
on 2007-03-02 19:18
just a last question :)
for example, there is a book named "best of open source", if i search
something like "source open" or "best source" or "source best" etc,
ferret find them, isn't it?
Jens K. (Guest)
on 2007-03-02 19:50
(Received via mailing list)
On Fri, Mar 02, 2007 at 06:18:14PM +0100, mix wrote:
> just a last question :)
> for example, there is a book named "best of open source", if i search
> something like "source open" or "best source" or "source best" etc,
> ferret find them, isn't it?

Usually it will. You can however construct queries that take the order
of query terms into account, if you need that.

Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
removed_email_address@domain.invalid | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
mix (Guest)
on 2007-03-02 19:59
Jens K. wrote:
>
> Usually it will. You can however construct queries that take the order
> of query terms into account, if you need that.
>
> Jens
>

perfect :) ok ok, just the last one, and about the case sensitive? if
i've a book "Open SOURCE", with a search "source" will it find it ?
thanks :)
David B. (Guest)
on 2007-03-03 00:50
(Received via mailing list)
On 3/3/07, mix <removed_email_address@domain.invalid> wrote:
> thanks :)
Yes. You can do both case sensitive and case insensitive searches in
Ferret depending on how you setup your analyzer but searches are case
insensitive by default so a search for "source" will find "SOURCE".
marco (Guest)
on 2007-03-03 14:10
David B. wrote:
> On 3/3/07, mix <removed_email_address@domain.invalid> wrote:
> Yes. You can do both case sensitive and case insensitive searches in
> Ferret depending on how you setup your analyzer but searches are case
> insensitive by default so a search for "source" will find "SOURCE".

perfect :)
just the last question, i promise :)
with an index of 5-10gb how does it work? because i've to save some
information in the index to use the highlight and do any query
Jens K. (Guest)
on 2007-03-05 00:16
(Received via mailing list)
On Sat, Mar 03, 2007 at 01:10:22PM +0100, marco wrote:
> David B. wrote:
> > On 3/3/07, mix <removed_email_address@domain.invalid> wrote:
> > Yes. You can do both case sensitive and case insensitive searches in
> > Ferret depending on how you setup your analyzer but searches are case
> > insensitive by default so a search for "source" will find "SOURCE".
>
> perfect :)
> just the last question, i promise :)
> with an index of 5-10gb how does it work? because i've to save some
> information in the index to use the highlight and do any query

Try it out :-) I didn't use such a large index yet, but I think Ferret
will be able to handle it just fine.

Jens

--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
removed_email_address@domain.invalid | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
mix (Guest)
on 2007-03-05 00:49
Jens K. wrote:
> Try it out :-) I didn't use such a large index yet, but I think Ferret
> will be able to handle it just fine.
>

i hope to achieve that dimension :)
This topic is locked and can not be replied to.