Getting the most of our caches when dealing with external HTTP services

dubstep · November 21, 2011, 10:56pm

Dear all,

I am dealing with a group of pages that display data from a few
different HTTP resources. In order to get these pages to be performant
you must understand that I want to cache as much as possible at the
front of my Rails application.

Another requirement is that we never show stale data to the user.
These pages aren’t getting huge amounts of requests/second, but when
there are no changes I want everything to feel snappy. If it takes a
bit more time for the first requests to set up the caches it’s okay.

Fortunately HTTP ships with something awesome since the 80’s:
conditional HTTP request. Just send an ‘If-Modified-Since’ or ‘If-None-
Match’ header along with the request and the server returns a ‘304 Not
Modified’ or the full response.

It is simple to use the Rails Cache Store to cache HTTP responses. But
actually, the most time consuming is building an object model from the
response and generate the HTML fragments.

Therefore I am looking for an API that allows me to conditionally
execute render code, based on the response of external HTTP requests.

[browser] --[GET /page]–> [Rails app][view][controller][model] –
[GET /resource]–> [external service]

I tried to come up with a first proposal:

gist.github.com

https://gist.github.com/mlangenberg/1383983

awesome_cache.rb

require 'nokogiri'

class Cache
  def initialize
    @store = {}
  end
  
  def read key
    @store[key] 
  end

This file has been truncated. show original

What do you guys think? Does this make any sense? Are there any other
approaches that I could try?

At least it provides a way through different layers, without leaking
knowledge. The downside is of course that every finder needs support a
block, where it normally would just return a result value.

This approach makes it impossible to cache different HTML fragments of
the same resource, but I think I can mitigate that in a followup
proposal.

Thank you for providing feedback,

Matthijs

mlangenberg · November 22, 2011, 12:05am

On Nov 21, 9:54pm, Matthijs L. [email protected] wrote:

bit more time for the first requests to set up the caches it’s okay.

Fortunately HTTP ships with something awesome since the 80’s:
conditional HTTP request. Just send an ‘If-Modified-Since’ or ‘If-None-
Match’ header along with the request and the server returns a ‘304 Not
Modified’ or the full response.

Nit picker’s corner: first version of http was 0.9 was in 1991, and if
my reading is correct http 1.0 is the one that added if-modified-since
etc.

I tried to come up with a first proposal:awesome_cache.rb · GitHub

What do you guys think? Does this make any sense? Are there any other
approaches that I could try?

Could you be using action controller’s stale? / fresh_when methods ?
If so you’ll probably going to want to use something like rack-cache,
varnish etc. in front of rails, since otherwise you’d only be able to
return 304 if that particular client had already requested the data
(which may or may not be a problem). If you only want to cache bits of
the page, you could also use bog standard fragment caching, using the
etag/last modified since etc. of the remote response as part of the
cache key

Fred

mlangenberg · November 23, 2011, 10:18pm

Alright, I think I can solve the issues by having two separate caches
and implement lazy loading.

In the view layer I want to be able to do:

<% cache [@post.cache_key, ‘author’] %>

<%= @post.author %>

<% end %>

Hello, <%= current_user.name %>, this is not cached.

<% cache [@post.cache_key, ‘body’] %>

<%= @post.body %>

<% end %>

Then the api model can look something like this:

class Api::Post
def self.first
etag = $cache.read(‘data:etag’)
response = fetch_first(etag)
if response == :not_modified
puts ‘cache HIT’
else
puts ‘cache MISS’
$cache.write ‘data:etag’, response.first
$cache.write ‘data’, response.last
end
new(etag)
end

def self.fetch_first(etag)
if etag.nil?
puts ‘Fetch XML’
[“1449ee0ec320e5bf5ed7a9949d4771d9”, “Hallo!”]
else
:not_modified
end
end

attr_reader :etag
def initialize(etag)
@etag = etag
@document = nil
end

def body
parse if @document.nil? # Lazy-loading
@document.children.first.text
end

def parse
@document = Nokogiri.parse($cache.read(‘data’))
end
end

Two questions:

What can I do when the API returns a collection. I cannot return
Array.
How can I wrap a Domain layer around the Api::Post class.

Matthijs

mlangenberg · November 22, 2011, 1:41pm

Hi Fred,

Thanks for getting back to me.

On Nov 22, 12:04am, Frederick C. [email protected]
wrote:

Dear all,

Fortunately HTTP ships with something awesome since the 80’s:
conditional HTTP request. Just send an ‘If-Modified-Since’ or ‘If-None-
Match’ header along with the request and the server returns a ‘304 Not
Modified’ or the full response.

Nit picker’s corner: first version of http was 0.9 was in 1991, and if
my reading is correct http 1.0 is the one that added if-modified-since
etc.

You are totally correct. What I was trying to say is that we often re-
invent the wheel while there are beautiful gems inside existing
standards such as HTTP that leverage a lot of functionality. I was
over exaggerating the history of HTTP.
Thanks for putting that right.

I tried to come up with a first proposal:awesome_cache.rb · GitHub
etag/last modified since etc. of the remote response as part of the
cache key

I wonder how that would look.

To be clear, I am not into sending a ‘403 Not Modified’ to the users
browser. For my application that would not be worth the effort.
Different users view the same page once. So the second user should be
served a cached response from Rails if possible. And by cached
response I mean little HTML fragments stitched together.

The request would still go through Rails. But based on a ‘403 Not
Modified’ from external HTTP services, it would skip parsing XML
responses and rendering expensive partials.

Am I right that in your approach, an ActiveResource finder would
attach the returned ETAG or Last-Modified date to the returned object,
so it can be used inside a view?

I agree with you that the cache key should be chosen from the Rails
view. There should be another place that stores the last fetched ETAG
for that particular resource. And it actually needs to know if there
is a cached HTML fragment for that resource.

Looks like a Catch-22 situation to me.