Rails Cluster Design

Hello everyone!

A collegue and I are beginning development on a site that will
eventually need to very scalable (assuming our business model is a
good idea). My problem is that I’m not a sysadmin guru, and I’m not
terribly comfortable with the design of the hosting platform. We’re
going to start with two servers initially (for cost reasons), and I’m
considering the following solution:

Run lighty on one machine, as well as a set of FCGI listeners. Use the
second machine as a DB server and more FCGI listeners on as we scale.
Keeping the DB server separate seems like a good idea, since
eventually I will want that machine to be ONLY a DB server.

As we scale, I’m thinking that we will add more machines to run FCGI
listeners, completely separate the DB server from app serving, and
front the whole thing with lighty on a machine that acts only as a web
server. If it gets any bigger than that, then I guess I’ll have to
hire a real sysadmin. But that’s putting the cart a couple miles in
front of the horse.

My concerns are:

  1. Redundancy. We’re going to be storing and serving large number of
    media files as part of the site, and I’m considering using MogileFS a
    la Robot Co-Op to handle storage of these files. I’m concerned about
    hardware failure and the implications of loss of data on these media
    files. Would I be best using hardware redundancy such as RAID, or are
    there better solutions?

  2. Off-site backups. (DB/Media Files, etc) Are there services that you
    can contract with? I see rsync is often used as a backup solution, so
    do you just a server with a different host and use it purely for
    backup, or what?

  3. Is the design above workable? We have to start small, but I want to
    make sure I don’t make bad decisions that will haunt me as we scale.

  4. So that I don’t waste everyone’s time with replying to this, are
    there any good places to find info on this topic? People that have
    done it before, etc? I’ve got Eric H.'s posts regarding their
    setup, so I’m wondering if there’s any other good sources of info.

Whew… Thanks for reading this far, and thanks for any info you can
offer!

Matt

I agree with everything Ezra said, but I’d add one piece of
advice.

Install a virtualization environment on your servers from day
one, and set up each environment as though you already had
the ideal number of machines you’d love to start with if money
wasn’t an issue.

That way, when money isn’t an issue, it will be silly simple
to reconfigure. If you never get there, it’s OK, you’ve only
given up a few percent of performance for a very wonderful
option later on.


– Tom M.

On Apr 1, 2006, at 2:31 PM, Matt W. wrote:

second machine as a DB server and more FCGI listeners on as we scale.
Keeping the DB server separate seems like a good idea, since
eventually I will want that machine to be ONLY a DB server.

As we scale, I’m thinking that we will add more machines to run FCGI
listeners, completely separate the DB server from app serving, and
front the whole thing with lighty on a machine that acts only as a web
server. If it gets any bigger than that, then I guess I’ll have to
hire a real sysadmin. But that’s putting the cart a couple miles in
front of the horse.

I think that you are definitely on the right track in your thinking
here. Here is a sketch of a very similar setup I have worked on.

http://brainspl.at/rails_2-servers.pdf

My concerns are:

  1. Redundancy. We’re going to be storing and serving large number of
    media files as part of the site, and I’m considering using MogileFS a
    la Robot Co-Op to handle storage of these files. I’m concerned about
    hardware failure and the implications of loss of data on these media
    files. Would I be best using hardware redundancy such as RAID, or are
    there better solutions?

Mogilefs or even just a SAN might be the best thing for large media
files like this.

  1. Off-site backups. (DB/Media Files, etc) Are there services that you
    can contract with? I see rsync is often used as a backup solution, so
    do you just a server with a different host and use it purely for
    backup, or what?

rsync works very well for this. You can rsync to any other server
with an IP address thats reachable from your server you want to back up.

  1. Is the design above workable? We have to start small, but I want to
    make sure I don’t make bad decisions that will haunt me as we scale.

I think your plan is sound. Just be meticulous about documenting
things as you set them up.

  1. So that I don’t waste everyone’s time with replying to this, are
    there any good places to find info on this topic? People that have
    done it before, etc? I’ve got Eric H.'s posts regarding their
    setup, so I’m wondering if there’s any other good sources of info.

Well :wink: I am writing a book about deployment with rails. but its not
out for another 4-6 weeks.

Whew… Thanks for reading this far, and thanks for any info you
can offer!

Matt

Cheers-
-Ezra

I’m with Tom on this - Xen is free, and not too difficult to set up.
You can host N virtual machine instances across M physical systems,
and change M or N independent of each other. For example, if you need
to bring up a new Web server, but the spare capacity is on your
database box - no problem with Xen - you just bring up the Web server
on the database box and get the most out of your existing hardware
benefit.

A huge, HUGE plus in terms of building in scalability. If you’re
running Intel servers and not using Xen (or VMware) as part of your
scalability plans, it’d be worth your time to check it out.

Regards

Dave M.

Yeah, I do agree with both of you guys. Zen is the way to go. That
way you can ‘practice’ your cluster on less hardware. And as the
others have said, you can add hardware and move virtual machines
between physical boxes without restarting or affecting the running VM’s.

Th only concern that I have is with running your database server

under virtualization. Although it really depends on how sql heavy
your apps will be.

Tom, David, what's your experience with db performance on Xen? This

might be the one place you want as close to the metal as possible. SO
you might run some tests to see if the database is fast enough under
Xen and only if its not fats enough move it to its own box.

On another note, you can get some pretty nice dedicated boxes from

http://layeredtech.com

Cheers-
-Ezra

Can’t say I’ve run databases under Xen where the hardware has been
heavily pushed on an ongoing basis - yet.

However, the database is just another element of the overall solution,
and Xen lets you manage it on that basis - I happily run databases
under Xen in the knowledge that I can migrate them to a
different/bigger box quickly if and when capacity becomes an issue.
If it comes down to it, I’ll run the database under Xen on a system
with no other virtual servers - I only seem to lose ~5% of scalability
through introducing Xen, and the flexibility gained is well worth it.
How many boxes do you know where that 5% lost capacity would tip a box
over? For me, none.

On a separate note, there’s an awful lot you can do to boost the
performance of a typical database before you truly exhaust the
capacity of modern hardware. If a database starts to slow down, I’d
focus first on identifying and fixing what’s making it slow, before
considering switching to a bigger box.

Regards

Dave M.

On Apr 3, 2006, at 8:31 AM, Ezra Z. wrote:

Tom, David, what’s your experience with db performance on Xen?
This might be the one place you want as close to the metal as
possible. SO you might run some tests to see if the database is
fast enough under Xen and only if its not fats enough move it to
its own box.

Ezra,

Haven’t benchmarked it myself yet, but here’s a paper that did.

http://www.cl.cam.ac.uk/Research/SRG/netos/papers/2003-xensosp.pdf

Synopsis:

        Native  Xen

OSDB-IR 172 158 91.86%
OSDB_OLTP 1714 1633 95.27%
dbench 418 400 95.69%

Average 94.28%

My conclusion: Well worth the up front effort and loss of performance
for the ability to move an entire system to new hardware in a matter
of minutes, guaranteed.


– Tom M.

On Apr 1, 2006, at 2:31 PM, Matt W. wrote:

Run lighty on one machine, as well as a set of FCGI listeners. Use the
second machine as a DB server and more FCGI listeners on as we scale.
Keeping the DB server separate seems like a good idea, since
eventually I will want that machine to be ONLY a DB server.

Our first DB box ran into IO load much before it hit CPU limitations
because it didn’t have enough memory for caching. If its in your
budget start with 64 bit hardware (like an Opteron) so you can add
memory as you scale. More memory means more in-memory caching which
means less disk IO. 32 bit hardware can only give 4GB per process,
we hit that mark in about a year of growth.

As we scale, I’m thinking that we will add more machines to run FCGI
listeners, completely separate the DB server from app serving, and
front the whole thing with lighty on a machine that acts only as a web
server. If it gets any bigger than that, then I guess I’ll have to
hire a real sysadmin. But that’s putting the cart a couple miles in
front of the horse.

Put your system configuration under version control, it’ll save you
lots of hassles. If you have half an idea of how to sysadmin you can
probably get some high-school student for cheap and say “make it
work!” without too much hand-holding. Every problem you’ll hit has
already been solved and indexed by Google.

  1. Redundancy. We’re going to be storing and serving large number of
    media files as part of the site, and I’m considering using MogileFS a
    la Robot Co-Op to handle storage of these files. I’m concerned about
    hardware failure and the implications of loss of data on these media
    files. Would I be best using hardware redundancy such as RAID, or are
    there better solutions?

Use RAID until you have to many files for one machine then switch to
MogileFS. YAGNI (MogileFS).

Don’t run MogileFS on RAID, it does that for you. You’ll only slow
things down (and waste a disk you could store files on!).

  1. Off-site backups. (DB/Media Files, etc) Are there services that you
    can contract with? I see rsync is often used as a backup solution, so
    do you just a server with a different host and use it purely for
    backup, or what?

We use amanda. dump/restore is the most reliable way to perform
backups.

  1. Is the design above workable? We have to start small, but I want to
    make sure I don’t make bad decisions that will haunt me as we scale.

Embrace YAGNI and DRY. You’ll have no problems. Read the Pragmatic
Programmer and apply it diligently. Especially, especially,
especially don’t leave broken windows.

  1. So that I don’t waste everyone’s time with replying to this, are
    there any good places to find info on this topic? People that have
    done it before, etc? I’ve got Eric H.'s posts regarding their
    setup, so I’m wondering if there’s any other good sources of info.

The LiveJournal architecture presentation is a good reference:

The most important thing to remember is don’t jump to the last
slide! Start at the front and expand as you go.


Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com