I’m trying to index ~130,000 documents [soon to grow to about 500,000
documents] and I’m wondering if its possible to combine ferret databases
or in some other way split up the building process.
Normally, indexing 130k documents wouldn’t be that painful except that
there are different types of links between these documents and they are
not absolute (so for example doc a refers to a document b but there are
multiple different documents laballed document a and document b and to
prevent false links I have to use some fairly computationally intensive
heuristics].
If its not possible to split up the building of a ferret index I’ll
probably resolve the links into absolute links as a separate part of the
process [which I can split up] and then build the ferret index one one
machine after that.
On Mon, Nov 20, 2006 at 03:52:21AM +0100, Holden Karau wrote:
If its not possible to split up the building of a ferret index I’ll
probably resolve the links into absolute links as a separate part of the
process [which I can split up] and then build the ferret index one one
machine after that.
Only one process or thread may write to the index at once, so you’ll
have to serialize your writing to the index somehow, i.e. gathering the
data on two machines (or threads) and hand it over to the indexer.
Jens
–
webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66
Jens K. wrote:
prevent false links I have to use some fairly computationally intensive
data on two machines (or threads) and hand it over to the indexer.
Ferret newbie warning
Shouldn’t it be possible to use the add_indexes method to merge one or
more indexes?
http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035
Cheers!
Patrick
On Mon, Nov 20, 2006 at 09:04:21AM -0500, Patrick R. wrote:
Jens K. wrote:
[…]
Only one process or thread may write to the index at once, so you’ll
have to serialize your writing to the index somehow, i.e. gathering the
data on two machines (or threads) and hand it over to the indexer.
Ferret newbie warning
Shouldn’t it be possible to use the add_indexes method to merge one or
more indexes?
http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035
interesting 
I didn’t ever try this, so if you do please let me know how it worked.
Jens
–
webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66
Jens K. wrote:
Ferret newbie warning
Jens
I just did the following in IRB:
i1 = Index.new
i2 = Index.new
i1 << {:text => ‘one’}
i2 << {:text => ‘two’}
i1.search_each(“text:one”) {|id, score| puts “#{i1[id][:text]”}
=> “one”
i1.search_each(“text:two”) {|id, score| puts “#{i1[id][:text]”}
=> nil
i1.add_indexes i2
i1.search_each(“text:two”) {|id, score| puts “#{i1[id][:text]”}
=> “two”
Seems to work as advertised…
Cheers!
Patrick