Parallal Building?

I’m trying to index ~130,000 documents [soon to grow to about 500,000
documents] and I’m wondering if its possible to combine ferret databases
or in some other way split up the building process.

Normally, indexing 130k documents wouldn’t be that painful except that
there are different types of links between these documents and they are
not absolute (so for example doc a refers to a document b but there are
multiple different documents laballed document a and document b and to
prevent false links I have to use some fairly computationally intensive
heuristics].

If its not possible to split up the building of a ferret index I’ll
probably resolve the links into absolute links as a separate part of the
process [which I can split up] and then build the ferret index one one
machine after that.

On Mon, Nov 20, 2006 at 03:52:21AM +0100, Holden Karau wrote:

If its not possible to split up the building of a ferret index I’ll
probably resolve the links into absolute links as a separate part of the
process [which I can split up] and then build the ferret index one one
machine after that.

Only one process or thread may write to the index at once, so you’ll
have to serialize your writing to the index somehow, i.e. gathering the
data on two machines (or threads) and hand it over to the indexer.

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Jens K. wrote:

prevent false links I have to use some fairly computationally intensive
data on two machines (or threads) and hand it over to the indexer.
Ferret newbie warning

Shouldn’t it be possible to use the add_indexes method to merge one or
more indexes?

http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035

Cheers!
Patrick

On Mon, Nov 20, 2006 at 09:04:21AM -0500, Patrick R. wrote:

Jens K. wrote:
[…]

Only one process or thread may write to the index at once, so you’ll
have to serialize your writing to the index somehow, i.e. gathering the
data on two machines (or threads) and hand it over to the indexer.
Ferret newbie warning

Shouldn’t it be possible to use the add_indexes method to merge one or
more indexes?

http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035

interesting :slight_smile:
I didn’t ever try this, so if you do please let me know how it worked.

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

Jens K. wrote:

Ferret newbie warning
Jens

I just did the following in IRB:

i1 = Index.new
i2 = Index.new

i1 << {:text => ‘one’}
i2 << {:text => ‘two’}

i1.search_each(“text:one”) {|id, score| puts “#{i1[id][:text]”}
=> “one”

i1.search_each(“text:two”) {|id, score| puts “#{i1[id][:text]”}
=> nil

i1.add_indexes i2
i1.search_each(“text:two”) {|id, score| puts “#{i1[id][:text]”}
=> “two”

Seems to work as advertised…

Cheers!
Patrick

Patrick R. wrote:

Jens K. wrote:

prevent false links I have to use some fairly computationally intensive
data on two machines (or threads) and hand it over to the indexer.
Ferret newbie warning

Shouldn’t it be possible to use the add_indexes method to merge one or
more indexes?

http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html#M000035

Cheers!
Patrick
I can’t believe I missed that. I’ll give it a shot sometime over the
weekend, thanks :slight_smile: