Nginx + Memcached: Questions / Advice

Hello,

I’m looking to add some increased caching to our setup, and was
interested in incorporating memcached to nginx. I just had a few
questions, looking for a little direction!

First, our current setup has an nginx front-end serving static content
(images, js, css, etc), with two backend servers running apache / php.
Currently, we utilize memcached on our backend, storing some snippets
of html and caching some of our more expensive db queries.

First question - has anyone done a comparison between setting up the
memcached integration through nginx and just serving the pages out of
memcached on the backend? That is, we already have to insert the
whole page into memcached on the backend. So, either I serve out of
memcached (and avoid the overhead of the apache hit), or I just have
apache / php query memcached and return the page.

The latter would be much easier to implement - not sure what sort of
performance different would be.

One reason it would be easier to implement if the caching is handled
through our backend - we need to only cache traffic that’s not logged
in. We “could” do this through nginx if we cookie logged in users,
and have nginx read that cookie, and bypass memcache if the cookie
isn’t found.

If we have nginx serving up content from memcached - how is gzipping
handled? Do we store it in the cache gzip’d?

Another thing - we do a bit of A/B testing of our content. So, to
fully track that, we’d need some percentage of sessions to bypass the
cache. From nginx, that’s a bit more tricky, as we don’t have the
session information if things are served out of memcached. So, I was
thinking, I could just route a certain percentage of requests that
have an external referrer back to our backend, and cookie those users
to also bypass the cache for the rest of the session.

Looking at the memcached module documentation - how do you specify
multiple memcached servers? It appears that it would treat them as
mirrors, not as a distributed cache?

I think that’s it. In any case, the main thing is, would the
increased performance outweigh the additional complexity, if anyone’s
examined that in more detail (serving the cached pages via apache vs
nginx directly)? Anything else I should be aware of?

Thanks!

Neil,

It sounds like you are looking for an agreement on your rationale. If
this
is the case, then yes, it seems generally sound. As you are no doubt
aware,
nginx serving from a memcached backend directly will certainly be much
faster than serving it from memcached via PHP and apache THEN to nginx
(in
answer to your question). However (as you figured out), the memcached
module is not currently flexible enough to accomidate your other needs
by
itself (maybe it doesn’t need to, either) so there is something to be
said
for the flexibility that you get by choosing the key on the backend.
Personally, I would recommend keeping it flexible so that you can use
memcached on the front-end for the general case as it is the most
efficient,
but make it simple to switch it back to the backend during A-B testing.
If
you didn’t want to have to maintain separate configs that you include
via a
symlink, you could probably implement this in much the same way people
have
done maintainence pages by checking for the existence of a file, but of
course this is an extra check for each request (so it will impact
performance and you might as well have stuck with just the backend).

With regard to the usage of multiple upstream servers, from what I can
tell
at the wiki documentation here:
http://wiki.nginx.org/NginxHttpMemcachedModule you can use multiple
backends
by using memcached_pass with a backend defined in an upstream block and
then
specify with memcached_next_upstream which events will cause the next
upstream to be queried.

This would lead me to believe that it always uses the same upstream
until a
failure, then it will use the next one if you have defined cases for
that.
I might have a chance to look through the code later or simply attempt
it,
but I cannot guarantee. Please let us know if you find out!

Thanks,
Merlin

One note on rereading my message; I was not attempting to indicate that
the
stat() from the file existence check would slow things down so much that
it
is as “not fast” as going through memcached->PHP->apache (but depending
on
things, it might be SOMETIMES) but rather that either you care about
performance or you care about flexibility and you should maximize one or
the
other, not necessarily both. In the scheme of things, neither delay
will
likely matter or be noticeable to anyone, even with both. It was simply
a
matter of simplicity ;).

  • Merlin