Hi,
I’m trying to get parallelized ferret indexing working for my AAF
indices, based on the example in the O’Reilly Ferret shortcut.
However, the resulting indices after merging seem to have no actual
documents.
I went and made minimal changes to the example in the Ferret shortcut
pdf, and indeed can’t get that to work either. I’d appreciate any help
anyone can give! Thanks!
The example is below:
#!/usr/bin/env ruby
require ‘rubygems’
require ‘ferret’
include Ferret::Index
5.times do |i|
name = “index#{i}”
puts name
i = Ferret::I.new(:path => “/tmp/#{i}”, :create => true)
i << {:name => name}
i.close
end
readers = []
readers << IndexReader.new("/tmp/0")
readers << IndexReader.new("/tmp/1")
readers << IndexReader.new("/tmp/2")
readers << IndexReader.new("/tmp/3")
readers << IndexReader.new("/tmp/4")
index_writer = IndexWriter.new(:path => “/tmp/test”)
index_writer.add_readers(readers)
index_writer.close()
readers.each {|reader| reader.close()}
i = Ferret::I.new(:path => ‘/tmp/test’)
res = i.search(‘name*’)
puts res.inspect # gives me: #<struct Ferret::Search::TopDocs
total_hits=0, hits=[], max_score=0.0,
searcher=#Ferret::Search::Searcher:0x58a6ec>
puts res.hits.size # gives me: 0
Hi!
seems to me you’re indexing strings starting with ‘index’ but you’re
searching for ‘name’? Or maybe correcting this already was one of your
minimal changes?
If not, try changing that line:
res = i.search(‘name*’)
to
res = i.search(‘index*’)
cheers,
Jens
On Wed, Jan 09, 2008 at 04:02:17PM -0500, Noah M. Daniels wrote:
puts name
index_writer = IndexWriter.new(:path => “/tmp/test”)
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk
–
Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database
Thanks, Jens. Good catch; this little example works correctly after
fixing that change.
However, my ActsAsFerret index merging does not work, and I’m
wondering if it’s something to do with AAF’s handling of documents in
an index?
Let’s call my indexed class Company…
Company.find_by_contents(’*’)
=> #<ActsAsFerret::SearchResults:0x2b1699108878 @current_page=nil,
@total_hits=3, @results=[], @total_pages=1, @per_page=3>
yet on each partial index prior to merging, that query would return a
bunch of results as one would expect.
now, here’s how I’ve built that index… any idea why the merged index
is broken?
module FerretHelpers
def merge_ferret_index_partitions(model)
model_dir = File.basename(model.aaf_configuration[:ferret][:path])
final_index_path = "/tmp/merged_parallel_ferret_index/#{model_dir}"
partial_index_path = "/tmp/partial_indices/#{model_dir}"
paths = Dir.glob("#{partial_index_path}/*")
paths.each do |path|
i = Ferret::I.new(:path => path, :create => true)
name = path.split('/').last
i << {:name => name}
i.close
end
readers = []
paths.each {|path| readers << IndexReader.new(path) }
index_writer = IndexWriter.new(:path => final_index_path)
index_writer.add_readers(readers)
index_writer.close()
readers.each {|reader| reader.close()}
index = Ferret::Index::Index.new(:path => final_index_path)
index.optimize
index.close
end
end
Hi!
On Wed, Jan 09, 2008 at 04:58:20PM -0500, Noah M. Daniels wrote:
Ok, further update – there was an obvious and stupid bug in my code
that was overwriting the partial indices. So now when that’s fixed, I
get the proper number of results for a search:
Company.find_by_contents(‘*’)
=> #<ActsAsFerret::SearchResults:0x2b0902ab4628 @current_page=nil,
@total_hits=247, @results=[], @total_pages=1, @per_page=247>
strange. Did you try to access the merged index with plain Ferret to see
if this works? Additionally, are your partial indexes index ok and
deliver results with contents when you search only one of them?
Cheers,
Jens
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :score=>1.0, :id=>“2”, :data=>{}},
is broken?
paths.each {|path| readers << IndexReader.new(path) }
end
puts name
index_writer = IndexWriter.new(:path => “/tmp/test”)
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk
–
Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database
Hi, Jens,
I’ll try what you suggested with plain ferret. What about
ferret_browser; what should I be looking for? The partial indices are
each fine prior to merging; they deliver results with contents when
searching only one of them.
thanks again
Ok, further update – there was an obvious and stupid bug in my code
that was overwriting the partial indices. So now when that’s fixed, I
get the proper number of results for a search:
Company.find_by_contents(’*’)
=> #<ActsAsFerret::SearchResults:0x2b0902ab4628 @current_page=nil,
@total_hits=247, @results=[], @total_pages=1, @per_page=247>
however, why is @results empty?
Similarly, find_id_by_contents also returns empty documents, it seems:
Company.find_id_by_contents(’*’)
=> [247, [{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil},
{:model=>“Company”, :data=>{}, :score=>1.0, :id=>nil}]]
when I would have expected:
Company.find_id_by_contents(’*’)
=> [247, [{:model=>“Company”, :score=>1.0, :id=>“189”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“2”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“192”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“4”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“6”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“7”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“8”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“37”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“13”, :data=>{}},
{:model=>“Company”, :score=>1.0, :id=>“21”, :data=>{}}]]
thanks for the help, and sorry for the silly previous bugs