On Thu, Mar 08, 2007 at 02:40:07AM +0900, Greg Loriman wrote:
a bit of manual labour.
I would love comments/advice on my above ideas, and further insights into
Well: my advice is that the sort of “loose federation” you describe is
something which is very difficult to build. You can make it work for,
proxy-based POP3/IMAP mail cluster: here the protocol is
the session can be unambiguously proxied to the right backend server,
there is no interaction between accounts. (When you start using IMAP
folders, this breaks down)
However even in such a scenario, you don’t have resilience. If you lose
machine where the A to G accounts are stored, then all those users lose
their mail. So in fact each backend machine has to be a mini-cluster, or
least, have mirrored disks and a warm spare machine to plug them into
Many people have resilience as high, or higher, on their agenda than
performance. So this doesn’t sound like a good way to go.
My advice would be:
- Keep your database in one place, so that all the front-ends have
access to the same data at all times.
To start with have a single database machine. Then expand this to a
2-machine database cluster. You can then point 2, 3, 4 or more
this cluster; for many applications you may find that you won’t need to
scale the database until later.
(Note that regardless of whether your application ends is heavier on
front-end CPU or back-end database resources, scaling the frontends and
database cluster separately makes it much easier to monitor resource
utilisation and scale each part as necessary)
The easiest way to do database clustering is with a master-slave
arrangement: do all your updates on the master, and let these replicate
to the slaves, where read-only queries take place. Of course, this isn’t
good enough for all applications, but for others it’s fine.
Full database clustering is challenging, but if your site is making you
of money you can always throw an Oracle 10g grid at it. If you’re
thinking of that route, you can start with Oracle on day one; it is now
for a single processor with up to 1GB of RAM and 4GB of table space.
- For transient session state, assuming your session objects aren’t
enormous, use DRb to start with. Point all your front-ends at the same
server. DRb is remarkably fast for what it does, since all the
is done in C.
When you outgrow that, go to memcached instead. This is actually not
set up: you just run a memcached process on each server. The session
automatically distributed between the nodes.
Both cases aren’t totally bombproof: if you lose a node, you’ll lose
session data. Either put important session data in the database, or
bombproof memcached server [boots from flash, no hard drive, fanless]
If that’s not important, then you don’t need a separate memcached
you have N webapp frontends, then just run memcached on each of them.
To facilitate the above I need some kind of proxy in front of the two
machines directing incoming requests to the correct machine based on the
login name which will be part of the url. Here I come unstuck since I have
no idea how to do this.
Well, the traditional approach is to buy a web loadbalancing appliance
resilient pair of them), and configure it for “sticky” load balancing
on a session cookie or some other attribute in the URL.
Hardware appliances are generally good. They are reliable over time;
much less to go wrong than a PC. They do a single job well.
You could instead decide to use a recent version of Apache with
do the proxying for you.
But it may be better to design your app with a single shared database
single shared session store, such that it actually doesn’t matter where
Can anyone give me a few pointers? Is squid the thing? Mongrel (I don’t
really know what mongrel is)? Can apache be made to do this, and if so is it
a bad idea? Obviously it needs to be pluggable since I’ll be using my own
code (C or Pascal) to do the lookups for the redirection.
mod_proxy with mod_rewrite is “pluggable” in the way you describe. See
and skip to the section headed “Proxy Throughput Round-Robin”.
You’d use an External Rewriting Program (map type prg) to choose which
back-end server to redirect to. The example above is written in Perl,
the same is equally possible in Ruby, C, Pascal or whatever.
However, if you don’t know anything about Apache, this is certainly not
where I’d recommend you start.
squid is a proxy cache. You can use it to accelerate static content, but
won’t help you much with dynamic pages from Rails. Mongrel is a
written in Ruby, much as webrick is, although is apparently more
In summary I’d say start your design with the KISS principle:
one database; scale it horizontally (by database clustering) when
one global session store; scale it horizontally when needed
one frontend application server; scale horizontally when needed
In addition to that, consider:
serve your static HTML, images and CSS from a fast webserver
(e.g. apache, lighttpd). This is easy to arrange.
consider serving your Rails application from the same webserver using
fastcgi (e.g. Apache mod_fcgid), rather than a Ruby webserver like
mongrel or webrick. Harder to set up, but you can migrate to this
Then most HTTP protocol handling is being done in C.
profile your application carefully to find out where the bottlenecks
before you throw hardware at performance problems.