Ideas for reimplementation of radiant caching

dasch · January 2, 2007, 1:06am

I want radiant to have a more robust caching mechanism than the current
‘expire every x minutes’ method of doing things - I’d like it to be able
to handle just removing things from the cache when things actually
change. I’ve roughed out a plan on how I think this should be done in a
safe manner.

It sounds like it should work, but this is mainly just what I thought of
on the way to work this morning - I’d like people to poke holes in it
and figure out any edge-cases I’m missing out on - if this process EVER
misses out on the clearing of a cached page, it’s pretty much useless.

I see a bit off overhead being added to the rendering process (since
after_initialize hooks are a performance no-no), but since this should
reduce the need for rendering by a huge amount, I’m hoping it’ll balance
out.

I will probably put off an implementation of this until after I finish
off my asset extension and get other things running, but I might pull
this forward for asset caching as well… not sure yet.

Example Site layout:

Pages:
    + Homepage ('main', 'sidebar')
        + Articles ('main', 'sidebar')
            - Article1 ('full', 'extract')
            - Article2 ('full', 'extract')

Snippets:
    - Header
    - Footer

Layouts
    - Default

Homepage:
<r:find url=‘articles’>
<r:children:each>
<r:title/>
<r:if_content part=“extract”/>
<r:content part=“extract”/>
</r:if_content>
</r:children>
</r:find>

‘default’ layout:
<r:snippet name=“header”/>
<r:content/>
<r:snippet name=“footer”/>

In this example, the homepage needs to be re-rendered when:
- homepage is modified
- new article child is created
- article1/2 is updated/removed
- article1/2 has extract added/removed
- default layout is modified
- header snippet is modified
- footer snippet is modified
- (please tell me if I’m missing cases here)

To know all those things, the following should be more than enough to
keep track of:

- any page which is read from db during the render
- any snippet which is read from db during the render
- any layout which is read from db during the render

New Table:

create_table 'cache_dependencies' do |t|
    t.column :page_id, :integer
    t.column :depends_on, :integer
    t.column :cache_type, :string
end

Since this is just a scratch table, perhaps this should use something
like madeline or maybe just a tree of files that are locked on write
instead of being in the main database file.

Filesystem cache:
pages/
page.data
page.cacheinfo

An after_save is added to each snippet/layout:

CacheDependencies.find_all_by_cache_type_and_depends_on('snippet',

snippet.id) do |cd|
if(File.exists(“pages/#{cd.page_id}.cacheinfo”))
File.rm(“pages/#{cd.page_id}.cacheinfo”)
end
end

An after_save is added to each page to do the same, and also an
after_create and after_destroy are added to page that will clear
anything depending on the parent when a child is created or removed.

A new class, CacheInfo is registered as an ‘after_initialize’ on Page,
Layout and Snippet.

When rendering begins, initialize the CacheInfo object:
CacheInfo.reset

When a page/layout/snippet is loaded, it’s id is added to a hash in
CacheInfo.

When the rendering is complete, the cacheinfo is written to the
cache_dependencies table, looking something like:

CacheDependencies.transaction do
    CacheDependencies.delete_all(:conditions => ['page_id = ?',

page.id])
page_depends.each do |pd|
CacheDependencies.create(:page_id => page.id, :depends_on =>
pd, :cache_type => ‘page’)
end
snippet_depends.each do |pd|
CacheDependencies.create(:page_id => page.id, :depends_on =>
pd, :cache_type => ‘snippet’)
end
layout_depends.each do |pd|
CacheDependencies.create(:page_id => page.id, :depends_on =>
pd, :cache_type => ‘layout’)
end
end

On each render, if the page.cacheinfo file exists, the page is loaded
- if so, render the page from scratch
- if not:
- send a 304 if the If-Modified-Since header is valid
- otherwise, send the cached page.data file.
- There will be a setting to use X-Sendfile if it’s
available to hand off the heavy lifting to apache/lighttpd

A .cacheinfo file could also by created to contain a time-based expiry
for pages that use out-of-db data
or pages that have time-sensitive data (current date, etc), or some
other cache-clearing condition.

I’ve made the assumption above that parts are never updated without a
page also being updated, but it should just be a case of adding an extra
item to the cache table to account for that.

Note that this is also extremely extensible - new tables in the database
can add in their own after_initialize hooks and after_save hooks to add
dependencies and clear layouts without having to interact with the rest
of the caching system.

Dan.

dasch · January 2, 2007, 1:28am

Dan,

It seems you have given this quite a bit of thought, and it could
work. I’m just worried about when will the ROI of performace hit make
it be worth.

Just as an anecdote, I’ve worked with paid enterprise-grade CMS’s,
open-source CMS’s and I’ve only seen two types of caching:

The classic: time expiry unless forced by hand.
Tag-level caching: You can choose which tags NOT to cache. So for
example, in a news site, pretty much the whole site could be cached,
except for content produced by <%breaking-news%>.

Maybe something like that could be adapted for Radiant (i.e. give the
choice which caching is desired).

dasch · January 2, 2007, 6:04am

Dan,
This sounds like a pretty good system.
I’m not sure if my suggestions/concerns are directly related caching,
but they kind of go with it.

The biggest hole in Radiant that I can see is that there is not proper
version management and publishing capabilities.
I work for a large state government organization and I have to use
Microsoft CMS 8 hours a day. It does lots of stuff completely
terribly, but one nice thing about it is it’s version
management/publishing system.

Here’s what I mean:
You have an existing page, you change some content, there is no way to
preview it or approval process for it other than publishing it and
viewing it live. Maybe this is by choice (I hope not), but I could see
how this is a major reason for lack of adoption on a larger scale.

What I’d like to see:
After a page is changed, a new version is saved and the status for the
new version is set to draft. There’s a way to preview this page, and
then publish/approve the page and it would then be live. This is when
it would be “cached”.
This still allows for dynamic content and definitely takes Radiant to
the next level for me (dare I say, “enterprise ready”).

My $.02, I don’t know if you’ve considered it in your caching scheme.

BJ Clark

dasch · January 2, 2007, 10:43am

Hi Dan, Ruben,

Actually I once build a commercial Enterprise CMS where a custom content
proxy would cache everything. It was flushed on time-out /and/ upon
changes in the content. The proxy allowed large companies to set up
caches wherever they pleased and still have a central repository. The
caching was fairly complicated as we had a broad definition of what
constituted as a change. The most difficult part is pages that contain
generated stuff that uses the structure of the content tree, like menus.

So my question to you is: how do you flush pages with menus that are
generated from the content tree?

Regards,
Erik.

Ruben D. Orduz wrote:

Just as an anecdote, I’ve worked with paid enterprise-grade CMS’s,
open-source CMS’s and I’ve only seen two types of caching:

The classic: time expiry unless forced by hand.

Tag-level caching: You can choose which tags NOT to cache. So for
example, in a news site, pretty much the whole site could be cached,
except for content produced by <%breaking-news%>.

–
Erik van Oosten

dasch · January 2, 2007, 3:47pm

Daniel S. wrote:

I see a bit off overhead being added to the rendering process (since
after_initialize hooks are a performance no-no), but since this should
reduce the need for rendering by a huge amount, I’m hoping it’ll balance
out.

I will probably put off an implementation of this until after I finish
off my asset extension and get other things running, but I might pull
this forward for asset caching as well… not sure yet.

I not opposed to what you’ve written above, but wonder if there might be
a simpler way that what you have proposed.

I’m wondering if deleting the whole cache every time a change is made
might work better. Currently, the cache is cleared when a snippet or
layout is edited. This seems to be a bit time consuming as the cache is
stored on disk, but justifiable in the case of snippets and layouts. To
speed it up I’m wondering if we could get better performance just
storing the rendered content in the db. Delete queries tend to be fast
and Radiant is already serving the content in the first place (so there
is no performance gain because the Web server can serve the page and
Radiant doesn’t have to).

Whatever course we took we’d need to see the numbers, but I favor a
simpler approach if performance is comparable.

–
John L.
http://wiseheartdesign.com

dasch · January 3, 2007, 4:57am

From looking at response_cache, there are only two features that it
implements that the default rails action caching didn’t (easily):

- Expire Everything
- Expire After time

Am I missing something else? If I’m not, then I think I might have found
something that will give a big jump on what I want to get done:

http://www.agilewebdevelopment.com/plugins/action_cache

This performs both of the above functions plus:

- If-Modified-Since/Last-Modified headers (really important for

any page with heavy traffic)
- Expiry Time can easily be set on a page by page basis.
- Optionally use X-Sendfile for making the webserver do the
heavy lifting
- All of the rails cache storage types (though only file-based
will work with X-Sendfile
- Regexp based expiry (great for expiring my asset transforms on
an asset update)

Get back to me asap if I’m missing some other magic feature of the
response cache, otherwise I’ll see how I go with ripping it out.

Dan.

dasch · January 3, 2007, 11:34pm

Daniel S. wrote:
From looking at response_cache, there are only two features that it
implements that the default rails action caching didn’t (easily):
- Expire Everything
- Expire After time
It also caches the entire response (headers + body).

Yeah, I left that off because I thought that the rails action cache
already did that - but regardless, the action_cache plugin adds that
behaviour. I’ll push on ahead.

dasch · January 3, 2007, 1:24pm

Daniel S. wrote:

From looking at response_cache, there are only two features that it
implements that the default rails action caching didn’t (easily):

Expire Everything

Expire After time

It also caches the entire response (headers + body).

Am I missing something else? If I’m not, then I think I might have found
something that will give a big jump on what I want to get done:

http://www.agilewebdevelopment.com/plugins/action_cache

That looks very nice. Please do investigate.

–
John L.
http://wiseheartdesign.com

dasch · January 23, 2007, 8:15am

Jeepers. I just did the coding to use action_cache instead of the the
ResponseCache mechanism, feeling all proud I decided to run some
benchmarks.

Raw apache can serve up a 60Kb page on my server at 677 requests/sec.

Action cache turned out to be two orders of magnitude slower than that -
6.7req/sec - and that’s using mod_xsendfile. Without, it’s half that
again - 3.2req/sec.

Radiant’s ResponseCache can spit out a 60Kb page on my server at 65
requests/sec. That’s an order of magnitude slower than raw apache (677
req/sec), but still quite a resonable rate.

I don’t really see anything in action cache to account for such a
performance hit - I’m guessing it’s in the underlying rails fragment
cache and the many levels of redirection there.

Seems playing with the cache was a waste of time. Probably should have
done my last benchmark first - the simplest rails app I could make
serving up just the string ‘hello’ can serve only 73 requests/sec - If I
had’ve just compared that to the ResponseCache speed I wouldn’t have
even bothered.

I still want to get handling of if-modified-since and 304 responses into
the ResponseCache, as well as some finer control over the cache timeout
for a page on a page-by-page basis, but looks like action_cache is not
the way to go there.

Dan.

dasch · January 23, 2007, 2:56pm

It’s annoying to have to clear page cache after every edit in order to
view your change.

The cache is automatically cleared for the page you edited, but only
that
page. If you edit a layout or snippet, the whole cache is cleared.

Also, I see no reason why we can’t attach a Preview button directly to

each page edit screen.

I think this would be a nice feature too. PDI?

Well then can we make Apache serve everything? Why not have the options

to make Radiant generate a full directory of HTML files. A possible way
to support having select pieces of the site be dynamic is to use Apache
server side includes to make calls back to Radiant for specific pieces.

This would be an ideal case, but I think the path Radiant takes is a
good
compromise. The advantage of Radiant’s caching system over static files
is
that you can include headers.

I know you’re probably using Apache in a figurative sense, but not
everyone
uses Apache. kckcc.edu runs quite speedily and effortlessly (excepting
the
site map) on Litespeed. We use the built-in LSAPI Rails bridge, which
is
purported to have a 30% boost over FastCGI. There are other ways to eek
out
performance too, not all of which have to do with the caching.

Are there any other reasons for changing Radiant’s caching?

I don’t see any. It’s “good enough” for most cases, I believe.

The unmentioned alternative, of course, is memcached. It would take
minimal
changes to the code, probably a single line in environment.rb, since
ResponseCache piggy-backs on the Rails caching mechanism. However, not
everyone could run memcached.

Sean

dasch · January 23, 2007, 2:11pm

Before changing how the caching works, I would ask, what are the reasons
for modifying caching. I can think of a few.

It’s annoying to have to clear page cache after every edit in order to
view your change.

The solution Dan suggests would make it possible to know exactly which
pages need their cache cleared. However, having a “versioning” system
like BJ suggests would also help to accomplish this goal. And even
currently, you can have a dev URL to your site where none of the pages
are cached.

Also, I see no reason why we can’t attach a Preview button directly to
each page edit screen.

Apache is faster than Radiant’s cache.

Well then can we make Apache serve everything? Why not have the options
to make Radiant generate a full directory of HTML files. A possible way
to support having select pieces of the site be dynamic is to use Apache
server side includes to make calls back to Radiant for specific pieces.

Are there any other reasons for changing Radiant’s caching?

dasch · January 24, 2007, 4:46am

Well then can we make Apache serve everything? Why not have the options
to make Radiant generate a full directory of HTML files. A possible way
to support having select pieces of the site be dynamic is to use Apache
server side includes to make calls back to Radiant for specific pieces.

This would be an ideal case, but I think the path Radiant takes is a good
compromise. The advantage of Radiant’s caching system over static files is
that you can include headers.

Sometimes this kind of information gets lost so here is a reminder:

The Corex branch (an experimental transition from Trunk to Mental) has
had a working implementation of a new caching mechanism that writes
files to public/ once they are requested (GET). Since public/ has
precedence over the application controller you can get extremely good
performance results. Unfortunately, this type of static caching turned
to out to be inflexible because headers cannot be cached. Otherwise it
would have been a powerful alternative to the current caching
mechanism. Too bad headers can’t be reliably modified with
tags.

On 1/23/07, Sean C. [email protected] wrote:

The unmentioned alternative, of course, is memcached. It would take minimal
http://lists.radiantcms.org/mailman/listinfo/radiant

–
Alexander H.
http://www2.truman.edu/~ah428

.

dasch · January 23, 2007, 11:30pm

I don’t see any. It’s “good enough” for most cases, I believe.

The unmentioned alternative, of course, is memcached. It would
take minimal changes to the code, probably a single line in
environment.rb , since ResponseCache piggy-backs on the Rails caching
mechanism. However, not everyone could run memcached.

Actually, ResponseCache doesn’t piggy-back on the rails caching
mechanism - at least not in the sense where it would be simple to swap
in one of the other rails caching backends. However, I think this might
actually be the reason why ResponseCache manages to perform so well -
all the abstraction in the rails caching creates a big performance hit.
Writing an extension that uses memcache rather than the filesystem would
require replacing the entire ResponseCache class.

Dan.

dasch · January 30, 2007, 11:38pm

Radiant currently caches only the page urls, without the params after
the ?
Am I the only one who thinks this is bad?
For example my products page listing works by
/products?offset=1200&limit=10
and I must turn off caching to make this work. Why not just let the
cache
work on the full uri? People do flip on a catalog back and forth, so I
did
benefit from changing this. If I had truly dynamic content, I’d turn off
the caching.

This in an extension did the trick:

class SiteController < ApplicationController

def show_page
response.headers.delete(‘Cache-Control’)
uri = request.request_uri.to_s[1…-1]
url = params[:url].to_s
if live? and (@cache.response_cached?(uri))
@cache.update_response(uri, response)
@performed_render = true
else
show_uncached_page(url, uri)
end
end

private

def show_uncached_page(url, uri)
@page = find_page(url)
unless @page.nil?
@page.request=request
@page.process(request, response)
@cache.cache_response(uri, response) if live? and @page.cache? #and
not request.post?
@performed_render = true
else
render :template => ‘site/not_found’, :status => 404
end
rescue Page::MissingRootPageError
redirect_to welcome_url
end

end

It also seems to be a good idea not to put post request’s response into
a cache.
A post supposed to change data, and the outcome of that should not be
cached - I should not say OK if you just signed up as root…

What do you think what will this change break?

Laszlo

dasch · January 31, 2007, 3:58am

It also seems to be a good idea not to put post request’s
response into
a cache.
A post supposed to change data, and the outcome of that should not be
cached - I should not say OK if you just signed up as root…

class Page
def cache?
request.get?
end
end

Perhaps that should be the definition in core. Should fix that up.

As for parameters - a workaround that also gives you prettier urls would
be to have your page return itself from find_by_url and extract the
params from the url, so

/products?offset=1200&limit=10

becomes

/products/perpage/10/page/120

If you’re doing things with query strings, I generally wouldn’t allow
the page to be cached - there’s too much variability -
offset=1200&limit=10 or limit=10&offset=1200?

You could also introduce a custom controller there, which might be a
better approach for trying to introduce such functionality into a CMS.

Dan.

dasch · February 1, 2007, 6:59pm

Regarding to caching, I might just try the following:

Make the page-parts accessible via urls. ( /pagename/partname )
Make the snippets accessible via urls.( /snippet/snippetname …
might clash )
Have a method on PagePart which’ll determine if their rendered
context is dynamic or not. (I.e. if it depends on the clock, or on outer
datasource then dynamic)
Let the controller set up an ssi object, give it the url ( Ssi.new
url ), put it on stack
5.a When a part/snippet/layout called for rendering and it is dynamic
ssi.append “”, and push a
new ssi on the stack with nil url (of course that stack can be the
callstack but that’s intursive)
After the normal rendering write the popped new ssi to the public dir if
it has an url
5.b else append the render to the top ssi object ( ssi.append context ),
and give it a url if that’s nil

I expect to get an ssi file that includes in the dynamic parts, which
made accesible via urls. Those dynamic radiant pageparts can in turn
include in more static ssi files, and those can include in dynamic parts
again… which isn’t desirable if slow, but I’ll fight that another day.
This is all possible because apache2 can do ssi with our radiant output
too (filter).

I also expect this to be a rocket compared to what we have now.

If I want to control the outgoing header I’d have to mark the outer page
dynamic so at first it can have-a-go and do the filtering only with its
output, but this is in the clouds, I can’t see if this can be done or
not. Apache might just hate us trying to set a header once he already
sent out one because of the ssi. I’ll have to try that.

Overhead: little. Dynamic things push and pop once, nothing else. Static
things do write things twice ( one for ssi and one for response ) but
then in return it will not run again since the ssi will take over the
load.

Dependency: when we edit in admin, delete all corresponding ssi… which
is not obvious a tiny bit, I should have left some breadcrumbs behind
when appended the just edited part to an ssi. Those dependent ssis now
need to be deleted.

I wish I could do all this in an extension, but I’m so far unable to
override anything in Page. If someone with a strong ruby-kungfu can help
me…
Until then I’ll just extend it.

Laszlo