Batch processing question

In my app a user can set the status of a Post to different flags like
‘v’ - visible, ‘d’ - mark for delete etc.

These flags are set via controller actions.

I have a batch process that runs and cleans all the posts marked for
delete.

Post.find(:all, :conditions => [‘status = ?’, ‘d’]).each do |p|
p.destroy end

This batch process runs every x many minutes.

Let’s say a user marks the post with ‘d’ => batch process runs at some
points => while the process is running the user marks the post as ‘v’.
Now inside the batch process the record is already targeted for delete
and will be when the do loop is done, but the flag has changed via the
controller action.

Ideally, if this happens I would like to not delete that post in the
batch process.

What’s the best way to handle this?

What I would do is adding a condition so it would delete objects
marked before a given time, so it won’t delete posts that were marked
after that time, giving the user time to change his mind in case he’s
editing the post when the batch is running.
Ways to do it could be adding a timestamp “marked_at”, or, in my
opinion a better one, a middle table/class “deletable_objects” which
would store a reference to posts (or other classes if you make it
polymorphic) and a “created_at” timestamp which would be used for the
find. Marking for deletion would create a related object and unmarking
it would destroy it.

Also, in case you usually have thousands posts to be deleted, you
might consider the “find_in_batches” method, which will load into
memory and process just a number of objects each time instead of all,
decreasing your server’s load.

On Sun, Sep 5, 2010 at 5:44 PM, badnaam [email protected] wrote:

In my app a user can set the status of a Post to different flags like
‘v’ - visible, ‘d’ - mark for delete etc.

These flags are set via controller actions.

I have a batch process that runs and cleans all the posts marked for
delete.

Post.find(:all, :conditions => [‘status = ?’, ‘d’]).each do |p|
p.destroy end

This batch process runs every x many minutes.

Let’s say a user marks the post with ‘d’ => batch process runs at some
points => while the process is running the user marks the post as ‘v’.
Now inside the batch process the record is already targeted for delete
and will be when the do loop is done, but the flag has changed via the
controller action.

Ideally, if this happens I would like to not delete that post in the
batch process.

What’s the best way to handle this?

Interesting. My first reaction is – why a batch job? Why not just
delete
the thing when it’s flagged and be done with it? :slight_smile:

Second thought - how long does this batch job take? I would think it’d
be so fast the chance of a status state conflict would be pretty small.

But assuming you really need to do this:

Have each status = ‘d’ assignment put the post id on a job queue and
have your batch job use that instead of your AR find. When a post is
set to status ‘v’, remove the post from the queue – shouldn’t be much
overhead if you keep the queue in memory.

FWIW,

Hassan S. ------------------------ [email protected]
twitter: @hassan