Per field boost values - possible? working?

I’m making a simple business directory search and I want to boost the
relevance of the ‘name’ field over the ‘address’ field - both stored in
the same document in the same index.

Here is some console code to demonstrate what I am actually doing

include Ferret::Document
=> Object

doc = Document.new
=> Document {
}

doc << Field.new(:name, “Business Search”, Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO, false, 2.0)
=> nil

doc << Field.new(“physical_address”, “New Zealand”, Field::Store::YES, Field::Index::TOKENIZED, Field::TermVector::NO, false, 1.0)
=> nil

doc
=> Document {
stored/uncompressed,indexed,tokenized,<name:Business Search>
stored/uncompressed,indexed,tokenized,<physical_address:New Zealand>
}

I realise the docs say: “Note: this value is not stored directly with
the document in the index.” so I guess that’s why the boost field isn’t
shown here.

However, browsing the index in Luke shows that the boost value on each
field is still set to the default 1.0. Also empirical testing suggests
the boost value I’m entering isn’t taken into account at all.

Am I doing something wrong or is the boost functionality not working?

I’m running ferret 0.9.4 with ruby 1.82 on debian sarge.

On 8/1/06, Sam G. [email protected] wrote:

}
I realise the docs say: “Note: this value is not stored directly with
the document in the index.” so I guess that’s why the boost field isn’t
shown here.

The boost isn’t shown here simple because I forgot to add it. It is
stored with the document when you create it. However, it isn’t stored
with the document in the index. It is stored in a “norms” file. There
is a norms file for every indexed field in the index (unless you chose
Field::Index::OMIT_NORMS) and the norms file contains a single byte
for every document in the index.

However, browsing the index in Luke shows that the boost value on each
field is still set to the default 1.0. Also empirical testing suggests
the boost value I’m entering isn’t taken into account at all.

I’m not sure why it doesn’t show up in Luke. The boost is definitely
working. I’m not sure what kinds of empirical tests you did. Try this;

require 'rubygems'
require 'ferret'

include Ferret::Index
include Ferret::Document

index = Index.new
doc = Document.new

doc << Field.new(:name, "Business Search",
                 Field::Store::YES,
                 Field::Index::TOKENIZED,
                 Field::TermVector::NO)
index << doc

doc.field(:name).boost = 2.0
index << doc

puts "Explanation for Doc 0"
puts index.explain("business", 0)
puts ""
puts "Explanation for Doc 1"
puts index.explain("business", 1)

The explain method explains the score for a query and a particular
document. You’ll notice the score is doubled for the second document.

Hope that helps,
Dave

PS: anyone interested in porting Luke to ruby? Luke won’t work on
future versions of the Ferret index. I’d be happy to help out but I
don’t have time to do it by myself.

On Tue, Aug 01, 2006 at 02:09:36PM +0900, David B. wrote:
[…]

The only thing staying the same is the field norms files. Everything
else is changing so it wouldn’t be worth doing it in Java using any of
the existing Luke code. It’d have to be completely rewritten in Ruby.

I haven’t done any GUI stuff in ruby before so I’m not sure which
library would be best. If anyone has any recommendations I could
probably start something and then others could play around with it.

I’ve started porting Luke to Ruby/Gtk a while ago. It’s far from
complete but I could make available what I have so far.

But don’t expect anything too fancy, I haven’t done any Gui stuff
with Ruby or Gtk before that, too :wink:

Jens


webit! Gesellschaft für neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer [email protected]
Schnorrstraße 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66

On 8/1/06, Sam G. [email protected] wrote:

Thanks Dave,
Using the explain method proved it was definitely working. The boost
value I was using, 2.0, just wasn’t enough to change the placing in the
test i was using.

Great. One thing I neglected to mention was that the field_norm value
that you see in the Index#explain output is actually the field boost
(I may change the name as it’s not really clear). You’ll notice that
1.0 and 2.0 get converted to 0.625 and 1.25 respectively. This is
because the the boost gets compressed into a single byte so it looses
a lot of it’s precision. This is just something to keep in mind when
setting boost values.

What are the (highlights of the) changes to the index that make it
incompatible with Luke? Just wondering what would be involved…

The only thing staying the same is the field norms files. Everything
else is changing so it wouldn’t be worth doing it in Java using any of
the existing Luke code. It’d have to be completely rewritten in Ruby.

I haven’t done any GUI stuff in ruby before so I’m not sure which
library would be best. If anyone has any recommendations I could
probably start something and then others could play around with it.

Cheers,
Dave

On 8/1/06, Jens K. [email protected] wrote:

I’ve started porting Luke to Ruby/Gtk a while ago. It’s far from
complete but I could make available what I have so far.

But don’t expect anything too fancy, I haven’t done any Gui stuff
with Ruby or Gtk before that, too :wink:

Cool, I’d love to see it.

Thanks Dave,
Using the explain method proved it was definitely working. The boost
value I was using, 2.0, just wasn’t enough to change the placing in the
test i was using.

What are the (highlights of the) changes to the index that make it
incompatible with Luke? Just wondering what would be involved…