Using ID as Key

Hi,

I followed the howto to use keys for documents:

http://ferret.davebalmain.com/trac/wiki/HowTos#Howtousekeysfordocument

If I add two documents with the same id, only one gets added to the
index as expected. However, I have found the key and id do not match.
So, attempting to access the index with the id does not work.

For instance, when I run this search:

INDEX.search_each(query) do |doc, score|
  logger.debug("Found doc: #{doc}, id: #{INDEX[doc]['id']}")
end

The following is output:

Found doc: 3, id: 69
Found doc: 17, id: 88

Is this as designed or am I missing something?

Thanks,
Tom

On Jan 27, 2006, at 8:10 AM, Tom D. wrote:

INDEX.search_each(query) do |doc, score|
  logger.debug("Found doc: #{doc}, id: #{INDEX[doc]['id']}")
end

The following is output:

Found doc: 3, id: 69
Found doc: 17, id: 88

Is this as designed or am I missing something?

The doc variable in your code is what is known in Lucene as the
document “id”. This is an internal number used by the index. It has
no relation to the primary key feature that Ferret adds. You’ve
called your field “id”, which confuses things a bit.

The document id is subject to change, if documents are deleted in the
middle and the index is optimized. So don’t rely on the internal
number for anything long-term.

Erik

Hi Erik,

Thanks for your response. Perhaps I am misunderstanding the how to,
but it implies that when you create an index and map the key to the id
as follows:

index = Index::Index.new(:key => :id)
index << {:id => 23, :data => “This is the data…”}
index << {:id => 23, :data => “This is the new data…”}

Then you can access this document by using either of the following:

index[“23”] #Get document with key 23
index[23] #Get document with internal number 23. It is NOT key
field. It is just internal Ferret id.

This implies that the id and key are the same, but according to my
first email example, they are not. Is this howto just misleading?
Based on what you said, the internal number will not necessarily match
the key.

Tom

Hi Tom,

I can see how this would be confusing. The internal id and the id you
give a document are unrelated and they’ll only be the same like this
when you add documents in order starting with id 0. I’ll change the
howto to remove the confusion.

Cheers,
Dave