Forum: Ferret How to make a single Writer

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Tom D. (Guest)
on 2006-03-05 22:49
(Received via mailing list)
I am using lighttpd with two procs and occasionally the .lock file
will not be properly removed by Ferret at which point my application
will end up throwing nothing but 500 errors.

Therefore, I have decided to go with a single writer thread... which
is probably a better long term solution anyways.

I would like some feedback on the best way to structure this.  My app
is hosted on TextDrive, so drb (distributed ruby) is not allowed.

The only other solution I can come up with is to write all pending
updates to a shared file.  This could involve either:

1) serialize each object using something like YAML to a file and then
deserializing them by the writer during updates.

2) just write the ids that need to be updated in the index and then
read each object fresh from the database using its id when updating
the index.

I am leaning towards solution 2, as it is easier to implement, should
be faster to write and read from the intermediate file and will be
easier to remove duplicate index updates.  The only drawback to 2 is
it will require one additional database read for every index update...
but this could be minimized by batch reading with a where id in (...).

Also, both 1 and 2 will require a lockfile for managing concurrent
access to the intermediate file.  I am thinking of just using this
lockfile library: http://raa.ruby-lang.org/project/lockfile/
Does anyone have any experience with this?

Thanks,
Tom
Alex Y. (Guest)
on 2006-03-05 23:08
(Received via mailing list)
Tom D. wrote:
> 2) just write the ids that need to be updated in the index and then
> read each object fresh from the database using its id when updating
> the index.
>
> I am leaning towards solution 2, as it is easier to implement, should
> be faster to write and read from the intermediate file and will be
> easier to remove duplicate index updates.  The only drawback to 2 is
> it will require one additional database read for every index update...
> but this could be minimized by batch reading with a where id in (...).
Why not add a needs_indexing column to your object table?  That way, not
only do you not have to care about concurrent intermediate file access
(because the DB takes care of that for you), but you can also do all
your pending database reads at once, if that's appropriate.  If you've
got a single writer thread, it can write the flag back either on all
once it's done, or on each as it goes.  It seems much simpler all round
to me...  Of course, if you don't want to change your object table
schema, then you could create a separate table specifically for this.
Tom D. (Guest)
on 2006-03-05 23:41
(Received via mailing list)
That is an excellent idea Alex.  Not sure why I didn't think of that :)

Basically, your concept is like adding a dirty flag to my table.

I like this approach much better.  However, for my particular case, I
will modify it slightly to just use the existing updated_at columns
that I have for each of my models that need indexing.  Then my index
writer won't have to lock the model database tables to reset the dirty
flag.  It will just keep track of the last time it updated the index.

Thanks for finding a much simpler solution.  That .lock file way was
making me nervous :)

Tom
Alex Y. (Guest)
on 2006-03-06 00:50
(Received via mailing list)
Tom D. wrote:
> Basically, your concept is like adding a dirty flag to my table.
Pretty much - it's dirty within a specific context.

> I like this approach much better.  However, for my particular case, I
> will modify it slightly to just use the existing updated_at columns
> that I have for each of my models that need indexing.  Then my index
> writer won't have to lock the model database tables to reset the dirty
> flag.  It will just keep track of the last time it updated the index.
Sounds good.  Just remember to record the *start* of the write, not the
end - otherwise you'll get records being marked as updated while your
write's happening, and they'll get missed by the next update.

> Thanks for finding a much simpler solution.  That .lock file way was
> making me nervous :)
No worries :-)
This topic is locked and can not be replied to.