How to make a single Writer


#1

I am using lighttpd with two procs and occasionally the .lock file
will not be properly removed by Ferret at which point my application
will end up throwing nothing but 500 errors.

Therefore, I have decided to go with a single writer thread… which
is probably a better long term solution anyways.

I would like some feedback on the best way to structure this. My app
is hosted on TextDrive, so drb (distributed ruby) is not allowed.

The only other solution I can come up with is to write all pending
updates to a shared file. This could involve either:

  1. serialize each object using something like YAML to a file and then
    deserializing them by the writer during updates.

  2. just write the ids that need to be updated in the index and then
    read each object fresh from the database using its id when updating
    the index.

I am leaning towards solution 2, as it is easier to implement, should
be faster to write and read from the intermediate file and will be
easier to remove duplicate index updates. The only drawback to 2 is
it will require one additional database read for every index update…
but this could be minimized by batch reading with a where id in (…).

Also, both 1 and 2 will require a lockfile for managing concurrent
access to the intermediate file. I am thinking of just using this
lockfile library: http://raa.ruby-lang.org/project/lockfile/
Does anyone have any experience with this?

Thanks,
Tom


#2

Tom D. wrote:

  1. just write the ids that need to be updated in the index and then
    read each object fresh from the database using its id when updating
    the index.

I am leaning towards solution 2, as it is easier to implement, should
be faster to write and read from the intermediate file and will be
easier to remove duplicate index updates. The only drawback to 2 is
it will require one additional database read for every index update…
but this could be minimized by batch reading with a where id in (…).
Why not add a needs_indexing column to your object table? That way, not
only do you not have to care about concurrent intermediate file access
(because the DB takes care of that for you), but you can also do all
your pending database reads at once, if that’s appropriate. If you’ve
got a single writer thread, it can write the flag back either on all
once it’s done, or on each as it goes. It seems much simpler all round
to me… Of course, if you don’t want to change your object table
schema, then you could create a separate table specifically for this.


#3

That is an excellent idea Alex. Not sure why I didn’t think of that :slight_smile:

Basically, your concept is like adding a dirty flag to my table.

I like this approach much better. However, for my particular case, I
will modify it slightly to just use the existing updated_at columns
that I have for each of my models that need indexing. Then my index
writer won’t have to lock the model database tables to reset the dirty
flag. It will just keep track of the last time it updated the index.

Thanks for finding a much simpler solution. That .lock file way was
making me nervous :slight_smile:

Tom


#4

Tom D. wrote:

Basically, your concept is like adding a dirty flag to my table.
Pretty much - it’s dirty within a specific context.

I like this approach much better. However, for my particular case, I
will modify it slightly to just use the existing updated_at columns
that I have for each of my models that need indexing. Then my index
writer won’t have to lock the model database tables to reset the dirty
flag. It will just keep track of the last time it updated the index.
Sounds good. Just remember to record the start of the write, not the
end - otherwise you’ll get records being marked as updated while your
write’s happening, and they’ll get missed by the next update.

Thanks for finding a much simpler solution. That .lock file way was
making me nervous :slight_smile:
No worries :slight_smile: