Best updating method

casper_the_ghost · June 15, 2006, 5:28pm

Hi All,

I have a Ferret index containing some cached RSS feeds.

I have a nightly cron script to cache the feeds, and I’d like to update
the index with the latest feeds.

I see the Index class has an update method, but I can’t work out how to
get the id of the relevant document to pass in.

Lets say I have a file called “google_news.xml”

I want to go:
my_index.update(google_id, google_doc)

I’m sure this is way too easy and I’m being massively dumb, but - - any
hints/advice gratefully received.

Many Thanks,
Steven

casper_the_ghost · June 15, 2006, 5:39pm

The way I usually handle updates like this is to store the filename in
the
index as a different field in the document. You can then search the
index
for that filename, get the index for that entry, and update accordingly.

casper_the_ghost · June 16, 2006, 9:03pm

In this case sounds like RSS feed URL is your natural primary key. You
could add untokenized ‘id’ field to your documents and then retrieve and
update them by using URLs as keys. And you could even have a more
natural field name if you create index with some optional params.

Example:

url = ‘Peak Obsession’

index = Ferret::Index::Index.new(:path => “#{RAILS_ROOT}/db/ferret”,
:id_field => ‘url’)

document = Ferret::Document::Document.new

document << Ferret::Document::Field.new(‘url’, url,
Ferret::Document::Field::Store::YES,
Ferret::Document::Field::Index::UNTOKENIZED)

document << Ferret::Document::Field.new(‘content’, ‘Rails are great!’,
Ferret::Document::Field::Store::YES,
Ferret::Document::Field::Index::TOKENIZED)

index << document

document = index[url]

puts document[‘url’] == url # true

document[‘content’] = ‘I agree’

index.update(url, document)

index[url][‘content’] == I agree # true

index.size == 1 # true

–
Sergei S.
Red Leaf Software LLC
web: http://redleafsoft.com

Hi All,

I have a Ferret index containing some cached RSS feeds.

I have a nightly cron script to cache the feeds, and I’d like to update
the index with the latest feeds.

I see the Index class has an update method, but I can’t work out how to
get the id of the relevant document to pass in.

Lets say I have a file called “google_news.xml”

I want to go:
my_index.update(google_id, google_doc)

I’m sure this is way too easy and I’m being massively dumb, but - - any
hints/advice gratefully received.

Many Thanks,
Steven

casper_the_ghost · June 20, 2006, 7:20pm

Many Thanks - very helpful