Dump and load functionnalities? Test patch provided

Hello everyone,

We need to create backups of our index, but there are a few
constraints:

  • our application shouldn’t go offline for that
  • it has to be done quickly

Ferret doesn’t seem to have this kind of functionnality
(though I’m very new to Ferret, I may be wrong), and
I figured that I couldn’t do it using plain Ruby
(it’s way too slow, try with a 2000000+ documents index),
so the only choice left was to incorporate its support
into Ferret itself.

I added this couple of features:

  • IndexReader#dump(“file”)
    Dump the whole index to a binary, non-portable file.

  • IndewWriter#load(“file”)
    Load this file, append it to the current index.

I wrote a somewhat dirty patch for Ferret 0.10.14 (works
with 0.10.13 too), you can find it here:

Parked at Loopia

The fact that the dump file format is binary and home made
doesn’t really matter to me, as long as it’s fast, but it’s
probably not very safe either (about security checks in my
code). Basically, the dump file format for one document is:

<int - number of hash entries for document #0 (2)>
<int - size of key>
<char [] - key data (“id”)>
<int - size of value>
<char [] - value data (“test”)>
<int - size of key>
<char [] - key data (“foo”)>
<int - size of value>
<char [] - value data (“bar”)>
<int - number of hash entries for document #1>

  • “int” being the C integer in the native endian and size,
    thus the file is only safely loadable one the same arch.
  • hash keys are converted from/to symbols during dump/load.
  • strings are stored without a ending \0.
  • sizes are in bytes.
  • of course, it’s all packed together, that’s binary.

Now, about the feature itself, is there another, better way
to do that?
If not, could that find its place into Ferret (probably after
some code cleaning, or even with a portable file format)?

Also, I don’t really know what to do with locks or mutexes,
I didn’t put any into my code and I couldn’t figure out how
ferret did for thread safety. Any ideas?

Thanks,


Maz
Rift Technologies - http://rift.fr/

The easiest way would just be to copy/zip the directory that the index
is in.

maz wrote:

Hello everyone,

We need to create backups of our index, but there are a few
constraints:

  • our application shouldn’t go offline for that
  • it has to be done quickly
    SNIP
    Maz
    Rift Technologies - http://rift.fr/

Sam G. wrote:

The easiest way would just be to copy/zip the directory that the index
is in.

Isn’t there a risk of data loss? I mean, if Ferret is already using
the index, I can’t just copy it like that because of locks and
buffered data that may leave the index in a bad state.


Maz
Rift Technologies - http://rift.fr/