Ferret quick dump&load

pyros · February 28, 2007, 10:48am

Hello Dave, Hello all,

Some weeks ago, Maz submit a patch to quickly dump or load a ferret
database in an other format to prevent index rebuilding (due to changes
in the format) or to simplify backup/replication tasks.

I re-paste this patch (it was against the 0.10.14) here :
Parked at Loopia

Do you think it’s intresting for ferret ?
Do you have any plan for this patch ?

PS: Thanks a lot for the 0.11.2

pyros · February 28, 2007, 2:10pm

On 2/28/07, Florent S. [email protected] wrote:

Do you think it’s intresting for ferret ?
Yes, I remember this patch. The problem I have with it is that it will
only back up stored fields, so it won’t work if you have unstored
fields. This might confuse people. As far as index rebuilding goes
when the format changes, this shouldn’t happen in future. I actually
store a version number in the segments files so in future I will make
Ferret automatically update the index when it is updated. I could have
done it this time except that it would have been more work than I was
willing to do given that I’d already created a 5000 line patch just to
implement lockless commits. Given that Ferret was still in beta I
decided not to bother. It also would have added a lot of complexity
that I didn’t think was necessary. Once Ferret hits 1.0 however I will
make sure it remains backwards compatible.

As for backing up and replication, you can actually do this by copying
the whole index. You can either copy the index directly if it isn’t
being written to, or alternatively you can open it in another process
and add it to a new index. Something like;

backup_writer = IndexWriter.new(:path => 'path/to/backup_index')
reader = IndexReader.new('path/to/index')
backup_writer.add_readers([reader])
reader.close
backup_writer.close

Do you have any plan for this patch ?

If you still think this is useful and you are willing to submit some
unit tests with it I would be happy to commit it.

PS: Thanks a lot for the 0.11.2

Your welcome,
Dave

pyros · February 28, 2007, 2:49pm

As usual, thanks for your helpfull answer.

My problem is that I will have, in a week (maybe 2), a big index with
many documents. In my application, ferret is the main data source, and
will not be able to rebuild (or something else) the data because of
their life cycle.

With AAF, it’s easy to rebuild the index, because you have all your data
in a SQL DB, but for me it’s impossible. This is why I look for a
solution of this kind to be able to quickly dump in an another format to
load it into the new index (with the new format).

I’m not sure it’s the best way to do it, but I don’t find another
solution. If you have any you are welcome

I know Ferret still in beta, but I can’t wait for the 1.0.