How to invoke cache sweeper from background jobs / models?

jason_sam · May 25, 2014, 12:49am

Hello list!

I need to expire fragment caches from a background job. The usual way to
expire caches is to create a cache sweeper and put the observer hooks
into
the controller. That is fine as long as database is only modified
through
controller actions.

But in this case I have a background job importing data, and that needs
to
invalidate fragment caches for records it touches. The most elegant way
would be to able to install the sweeping observer while the background
job
is running so it will expire all touched object’s caches.

Thinking further, sometimes from the Rails console, I’m starting manual
imports by invoking MyModel.import_all! on the model that’s going to
import
data. Now, how would caches be expired here? Clearly, the MVC way cannot
work here.

Time to break the rules. So, what would be the best approach to handle
this?
I’m seeing three ways with different downsides but only one that would
solve
the problem:

Expire fragments through observers the MVC way
Downside: Background importers won’t purge caches because only
controllers install sweepers/observers.
Conclusion: This is no option.
Expire fragments through observers installed in controllers and
background jobs
Downside: No sweeping when running imports from console.
Conclusion: Probably the cleanest solution but not a real option.
Expire fragments directly from the models
Downside: Not the proposed MVC way, no support in Rails framework.
Conclusion: The only solution that works in a DRY way for me.

Looking at these points, I question the whole design behind sweepers and
their MVC voodoo. Trying to force cache sweeping into controllers only
is
somehow a failed design. I don’t see why that is. In the end, it’s the
data
that makes up the views. Why should a controller sweep its caches? Is it
for
performance reasons? Clearly it cannot be the choice about clearing its
own
views because other controllers could show results from the same models.

I often end up purging the caches from all controllers within the
observer.
While that is DRY it shows the misconception about making sweepers
available
to controllers only.

–
Replies to list only preferred.

codewerker · May 25, 2014, 4:52pm

MVC doesn’t mean that all your logic has to be in a model, view, or
controller. It sounds like you just need a class to do your import work,
such can be called from a controller, background job, script, migration,
etc.

codewerker · May 25, 2014, 7:27pm

On Saturday, May 24, 2014 6:48:50 PM UTC-4, Kai K. wrote:

would be to able to install the sweeping observer while the background job
I’m seeing three ways with different downsides but only one that would
Downside: No sweeping when running imports from console.
that makes up the views. Why should a controller sweep its caches? Is it

–
Replies to list only preferred.

A couple of responses. First, in Rails 4 this was changed. It now
incorporates generational caching. The cache key has a digest which is
a
hash of the underlying template content so that any changes in content
will
bust the cache automatically. Observers have actually been removed from
Rails 4 although you can still get the functionality back by using a
gem.

Second, the architecture did make sense initially. In your typical
Rails
app, you don’t want your database updated without going through the
controller. Even interfaces from other applications are, ideally,
processed through an API as json or xml. Caching is the least of the
issues, you want to insure that all the constraints and security
established in the controller and model are applied. Fragment caching
is
actually a view process and, in MVS, you don’t want to manage view
processes from the model, the controller is designed to be the one to
generate messages from one to the other. Therefore, you used to
generally
have the Controller generate a message to the view when there’s a change
in
the underlying model that affect it. By the way, there are things other
than data that can change a view fragment. In particular, a change in
image file references that get incorporated into the view come to mind.

There are always situations that won’t fit into this, although they are
usually the exception and not the rule. You could have a resource that
you
maintain only for information and gets updated on some periodic basis by
a
batch import. In that case, you are correct, you either have to define
a
method within the model (which is most common) that may cross the
boundaries a bit.

codewerker · May 25, 2014, 7:32pm

On Sunday, May 25, 2014 1:26:38 PM UTC-4, mike2r wrote:

the controller. That is fine as long as database is only modified through
imports by invoking MyModel.import_all! on the model that’s going to

Expire fragments through observers the MVC way
Downside: Not the proposed MVC way, no support in Rails framework.
views because other controllers could show results from the same models.

issues, you want to insure that all the constraints and security
usually the exception and not the rule. You could have a resource that you
maintain only for information and gets updated on some periodic basis by a
batch import. In that case, you are correct, you either have to define a
method within the model (which is most common) that may cross the
boundaries a bit.

Sorry, that should be MVC, not MVS.

codewerker · June 21, 2014, 3:33am

mike2r [email protected] schrieb:

A couple of responses. First, in Rails 4 this was changed. It now
incorporates generational caching. The cache key has a digest which is a
hash of the underlying template content so that any changes in content
will
bust the cache automatically. Observers have actually been removed from
Rails 4 although you can still get the functionality back by using a gem.

Actually, that is what I’ve switched to now. At first, my concern was
with
piling up a lot of stale and unused content in the cache. But I did fix
that
by switching over to memcached as the cache because it can be limited by
size and has a LRU replacement policy.

To make use of this with fragment caching, I’ve started to change the
cache
keys in a way to ensure that they always include the object change time
but
also additional meta data that has to be used to detect cache
invalidations

like adding change times of associated objects or ids of parent
objects.
When used together with russian doll caching, this works very well now -
and
giving up on using file based caching, performance improved a lot
without
borthering about infinite cache growth.

in
the underlying model that affect it. By the way, there are things other
than data that can change a view fragment. In particular, a change in
image file references that get incorporated into the view come to mind.

That is, of course, true. Actually, I really never want to bother with
view
components from the model. That’s the point of MVC - and even if it
looks
like complicating things at a first glance (especially to beginners), it
always results in a much cleaner design (at least if you don’t try to
exploit it), and thus in less bugs and easier to read code.

But:

There are always situations that won’t fit into this, although they are
usually the exception and not the rule. You could have a resource that
you maintain only for information and gets updated on some periodic basis
by a
batch import. In that case, you are correct, you either have to define a
method within the model (which is most common) that may cross the
boundaries a bit.

There are situations which do not go through the controller - and
cannot.
The controller is being accessed through the action dispatcher. If I do
have
a background worker, it won’t go that route. It will become a controller
more or less itself which does the job. So, it appears to me that Rails
is
missing some glue part between the actual dispatching and the
persistence
layer which should not be the ActionController as a sole component.

I’ve seen some people creating extensions to Rails that fill this gap
(mostly based on events or notifications/subscriptions, an example is
wisper
1), which actually look clean and like good ideas but still cumbersome
to
integrate into Rails, probably just because Rails lacks native and well
integrated support for this. The main idea behind this concept is to
insert
a service layer between controllers and models. I like that idea because
I
could use the same service layer from background jobs. But I don’t think
it
would have solved the specific problem I had with invalidating caches.
And
after some testing I found programmatically invalidating caches through
some
logic is incredibly slow in Rails (defeating to having the purpose of
using
caches) when you do huge batched updates to your data. Thus, I decided
to go
that memcached route which can do all the heavy lifting for me.

–
Replies to list only preferred.

codewerker · June 21, 2014, 3:51am

Josh J. [email protected] schrieb:

MVC doesn’t mean that all your logic has to be in a model, view, or
controller. It sounds like you just need a class to do your import work,
such can be called from a controller, background job, script, migration,
etc.

It is mostly done that way, though that class shares inclusion of a
module
with the model (that one that triggers copying one import item from the
preprocessed data table to my model’s table). It’s cleanly split up into
modules concerning import payloads, import sources, data mapping, etc…

The batch operations are parallelized with celluloid-pmap because we may
have tens of thousands of records to compare with existing records and
import from time to time, involving downloading additional data like
images
or attachments. Import workers can work in parallel by using appropriate
locks on the database, implicitly by using pmap, and explicitly by just
starting multiple workers in parallel. If handling the database thread
pool
correctly from within pmap, there’s no problem with keeping the database
busy without running out of connections - occassionally there are lock
timeouts but those are now handled gracefully by just retrying the
working
set until it got processed completely without creating infinite loops.

It works pretty well now, mostly bullet-proof and I’m confident with
performance. Exceptions are caught and handled by feeding them back to
the
import data table so those can be reviewed manually - and either retried
after applying a fix or submitted to a web form for manual handling.
Everything is wrapped into transactions so we don’t leave some half-done
operations behind.

Still, such a class cannot expire caches in the way Rails did its design
around caches. As stated in the other post, cache expiration is now
handled
outside of Rails by using proper cache keys and view digests, and by
using
an LRU replacement policy and limiting the cache size (which memcached
perfectly does, thus I switched to that).

The webviews show data almost instantly now on almost every request when
we
had up to 20 seconds before. The import process got a speedup of factor
50
or so - I didn’t measure it. It’s just much faster now. Generated pages
are
then additionally cached by a varnish frontend cache. Invalidating that
properly, especially when using HTML5 manifests, can get tricky once in
a
while but I’m working on that. Maybe we could play with its edge-side
includes a little bit to parallelize page generation and break it into
individual fragments but currently the page design does not offer to
make
use of that effectively.

–
Replies to list only preferred.