Hello, I'm looking to add some increased caching to our setup, and was interested in incorporating memcached to nginx. I just had a few questions, looking for a little direction! First, our current setup has an nginx front-end serving static content (images, js, css, etc), with two backend servers running apache / php. Currently, we utilize memcached on our backend, storing some snippets of html and caching some of our more expensive db queries. First question - has anyone done a comparison between setting up the memcached integration through nginx and just serving the pages out of memcached on the backend? That is, we already have to insert the whole page into memcached on the backend. So, either I serve out of memcached (and avoid the overhead of the apache hit), or I just have apache / php query memcached and return the page. The latter would be much easier to implement - not sure what sort of performance different would be. One reason it would be easier to implement if the caching is handled through our backend - we need to only cache traffic that's not logged in. We "could" do this through nginx if we cookie logged in users, and have nginx read that cookie, and bypass memcache if the cookie isn't found. If we have nginx serving up content from memcached - how is gzipping handled? Do we store it in the cache gzip'd? Another thing - we do a bit of A/B testing of our content. So, to fully track that, we'd need some percentage of sessions to bypass the cache. From nginx, that's a bit more tricky, as we don't have the session information if things are served out of memcached. So, I was thinking, I could just route a certain percentage of requests that have an external referrer back to our backend, and cookie those users to also bypass the cache for the rest of the session. Looking at the memcached module documentation - how do you specify multiple memcached servers? It appears that it would treat them as mirrors, not as a distributed cache? I think that's it. In any case, the main thing is, would the increased performance outweigh the additional complexity, if anyone's examined that in more detail (serving the cached pages via apache vs nginx directly)? Anything else I should be aware of? Thanks!
on 2009-03-16 22:31
on 2009-03-17 02:04
Neil, It sounds like you are looking for an agreement on your rationale. If this is the case, then yes, it seems generally sound. As you are no doubt aware, nginx serving from a memcached backend directly will certainly be much faster than serving it from memcached via PHP and apache THEN to nginx (in answer to your question). However (as you figured out), the memcached module is not currently flexible enough to accomidate your other needs by itself (maybe it doesn't need to, either) so there is something to be said for the flexibility that you get by choosing the key on the backend. Personally, I would recommend keeping it flexible so that you can use memcached on the front-end for the general case as it is the most efficient, but make it simple to switch it back to the backend during A-B testing. If you didn't want to have to maintain separate configs that you include via a symlink, you could probably implement this in much the same way people have done maintainence pages by checking for the existence of a file, but of course this is an extra check for each request (so it will impact performance and you might as well have stuck with just the backend). With regard to the usage of multiple upstream servers, from what I can tell at the wiki documentation here: http://wiki.nginx.org/NginxHttpMemcachedModule you can use multiple backends by using memcached_pass with a backend defined in an upstream block and then specify with memcached_next_upstream which events will cause the next upstream to be queried. This would lead me to believe that it always uses the same upstream until a failure, then it will use the next one if you have defined cases for that. I might have a chance to look through the code later or simply attempt it, but I cannot guarantee. Please let us know if you find out! Thanks, Merlin
on 2009-03-17 02:10
One note on rereading my message; I was not attempting to indicate that the stat() from the file existence check would slow things down so much that it is as "not fast" as going through memcached->PHP->apache (but depending on things, it might be SOMETIMES) but rather that either you care about performance or you care about flexibility and you should maximize one or the other, not necessarily both. In the scheme of things, neither delay will likely matter or be noticeable to anyone, even with both. It was simply a matter of simplicity ;). - Merlin