Parallel indexing doesn't work?

Hi,

I’m trying to get parallelized ferret indexing working for my AAF
indices, based on the example in the O’Reilly Ferret shortcut.
However, the resulting indices after merging seem to have no actual
documents.

I went and made minimal changes to the example in the Ferret shortcut
pdf, and indeed can’t get that to work either. I’d appreciate any help
anyone can give! Thanks!

The example is below:

#!/usr/bin/env ruby

require ‘rubygems’
require ‘ferret’
include Ferret::Index

5.times do |i|
name = “index#{i}”
puts name
i = Ferret::I.new(:path => “/tmp/#{i}”, :create => true)
i << {:name => name}
i.close
end
readers = []
readers << IndexReader.new("/tmp/0")
readers << IndexReader.new("/tmp/1")
readers << IndexReader.new("/tmp/2")
readers << IndexReader.new("/tmp/3")
readers << IndexReader.new("/tmp/4")
index_writer = IndexWriter.new(:path => “/tmp/test”)
index_writer.add_readers(readers)
index_writer.close()
readers.each {|reader| reader.close()}
i = Ferret::I.new(:path => ‘/tmp/test’)
res = i.search(‘name*’)
puts res.inspect # gives me: #<struct Ferret::Search::TopDocs
total_hits=0, hits=[], max_score=0.0,
searcher=#Ferret::Search::Searcher:0x58a6ec>

puts res.hits.size # gives me: 0

Hi!

seems to me you’re indexing strings starting with ‘index’ but you’re
searching for ‘name’? Or maybe correcting this already was one of your
minimal changes?

If not, try changing that line:

res = i.search(‘name*’)
to
res = i.search(‘index*’)

cheers,
Jens

On Wed, Jan 09, 2008 at 04:02:17PM -0500, Noah M. Daniels wrote:

puts name
index_writer = IndexWriter.new(:path => “/tmp/test”)


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

Thanks, Jens. Good catch; this little example works correctly after
fixing that change.

However, my ActsAsFerret index merging does not work, and I’m
wondering if it’s something to do with AAF’s handling of documents in
an index?

Let’s call my indexed class Company…

Company.find_by_contents(’*’)
=> #<ActsAsFerret::SearchResults:0x2b1699108878 @current_page=nil,
@total_hits=3, @results=[], @total_pages=1, @per_page=3>

yet on each partial index prior to merging, that query would return a
bunch of results as one would expect.

now, here’s how I’ve built that index… any idea why the merged index
is broken?

module FerretHelpers
def merge_ferret_index_partitions(model)

 model_dir = File.basename(model.aaf_configuration[:ferret][:path])

 final_index_path = "/tmp/merged_parallel_ferret_index/#{model_dir}"

 partial_index_path = "/tmp/partial_indices/#{model_dir}"

 paths = Dir.glob("#{partial_index_path}/*")

 paths.each do |path|
   i = Ferret::I.new(:path => path, :create => true)
   name = path.split('/').last
   i << {:name => name}
   i.close
 end

 readers = []
 paths.each {|path| readers << IndexReader.new(path) }
 index_writer = IndexWriter.new(:path => final_index_path)
 index_writer.add_readers(readers)
 index_writer.close()
 readers.each {|reader| reader.close()}
 index = Ferret::Index::Index.new(:path => final_index_path)
 index.optimize
 index.close

end
end

Hi!

On Wed, Jan 09, 2008 at 04:58:20PM -0500, Noah M. Daniels wrote:

Ok, further update – there was an obvious and stupid bug in my code
that was overwriting the partial indices. So now when that’s fixed, I
get the proper number of results for a search:

Company.find_by_contents(‘*’)
=> #<ActsAsFerret::SearchResults:0x2b0902ab4628 @current_page=nil,
@total_hits=247, @results=[], @total_pages=1, @per_page=247>

strange. Did you try to access the merged index with plain Ferret to see
if this works? Additionally, are your partial indexes index ok and
deliver results with contents when you search only one of them?

Cheers,
Jens

{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :score=>1.0, :id=>“2”, :data=>{}},

is broken?

paths.each {|path| readers << IndexReader.new(path) }

end

puts name
index_writer = IndexWriter.new(:path => “/tmp/test”)


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database

Hi, Jens,

I’ll try what you suggested with plain ferret. What about
ferret_browser; what should I be looking for? The partial indices are
each fine prior to merging; they deliver results with contents when
searching only one of them.

thanks again

Ok, further update – there was an obvious and stupid bug in my code
that was overwriting the partial indices. So now when that’s fixed, I
get the proper number of results for a search:

Company.find_by_contents(’*’)
=> #<ActsAsFerret::SearchResults:0x2b0902ab4628 @current_page=nil,
@total_hits=247, @results=[], @total_pages=1, @per_page=247>

however, why is @results empty?

Similarly, find_id_by_contents also returns empty documents, it seems:

Company.find_id_by_contents(’*’)
=> [247, [{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil}]]

when I would have expected:

Company.find_id_by_contents(’*’)
=> [247, [{:model=>“Company”, :score=>1.0, :id=>“189”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“2”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“192”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“4”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“6”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“7”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“8”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“37”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“13”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“21”, :data=>{}}]]

thanks for the help, and sorry for the silly previous bugs :slight_smile: