I’ve seen some talk about GFS on the list in the past. I’m currently
looking at this as a potential solution to deploying Rails apps to 50+
servers. Basically to take advantage of GFS giving you a single
disk/file-system across the servers; to help ensure truly one set of
files
deployed to all servers, faster deploys, etc.
We currently have about 50+ servers, and that will grow. Our
application
architecture is SOA, so in reality one rails app won’t be on all 50
servers,
they’ll be grouped, say 10-20 servers per service.
I am currently eyeing a GFS setup where we use a server (per group) as a
GFS
disk, and GNBD across the machines. So, no SAN, no iSCSI, no fiber,
etc.
It’s what I have available, so balancing the advantage of GFS vs.
deploying
the code to all machines in a more traditional setup.
The servers in this case are 64bit boxes, with dual cores, and GigE
(dual,
but for this discussion assume a single one, since we split the net on
them,
etc.). Also, our application file storage is done using a different
infrastructure, so it doesn’t play into this. Databases are also on
different boxes.
I have not used GFS before, so I’m hoping for some input on some of
these
questions:
I presume that for the actual Rails application code, since it gets
loaded
up once in production mode, that say 20 servers pulling that from a
single
GNBD/GFS file system server would be no biggy. Correct?
Logs - this seems to be the danger area to me. Assuming we have “high
traffic”, and that we do quite a bit of logging (we log a lot of info
for
metrics and ability to follow requests through the SOA architecture,
etc.),
I worry about 20 servers all writing to a single log on the one GNBD/GFS
server. Valid worry, or? Are there alternatives I should look at for
logging in such an environment?
Thoughts, comments, notes on this approach in general?
service.
using a different infrastructure, so it doesn’t play into this.
Databases are also on different boxes.
I have not used GFS before, so I’m hoping for some input on some of
these questions:
I presume that for the actual Rails application code, since it
gets loaded up once in production mode, that say 20 servers pulling
that from a single GNBD/GFS file system server would be no biggy.
Correct?
Yeah it’s no biggy.
Logs - this seems to be the danger area to me. Assuming we have
“high traffic”, and that we do quite a bit of logging (we log a
lot of info for metrics and ability to follow requests through the
SOA architecture, etc.), I worry about 20 servers all writing to a
single log on the one GNBD/GFS server. Valid worry, or? Are there
alternatives I should look at for logging in such an environment?
GFS has something called context dependant symlink. This lets you
make symlinks that resolve to a different path based on stuff like
hostname. So you setup a set of directories names after all the
hostnames in the cluster. THen make log a symlink to @hostname, observe:
ey00-s00070 ~ # cd /data/ey/shared/
ey00-s00070 shared # ls -lsa
total 32
4 drwxrwxr-x 7 ez ez 3864 Dec 3 2006 .
4 drwxr-xr-x 4 ez ez 3864 Dec 7 15:14 …
4 drwxrwxrwx 2 ez ez 3864 Jun 1 13:30 ey00-s00070
4 drwxrwxrwx 2 ez ez 3864 Jun 4 22:07 ey00-s00071
4 lrwxrwxrwx 1 ez ez 9 Dec 3 2006 log -> @hostname
See how log is a symlink to @hostname? After you make a directory
namesd after all your hostnames that share a filesystem, you do this
to link them:
$ ln -s @hostname log
Thoughts, comments, notes on this approach in general?
I have many many nodes running sharing GFS filesystems and it works
great in general, much more robust then NFS. I do it all off of a SAN
network though so I have no experience with the way you are trying to
do it with no san.
Cheers-
– Ezra Z.
– Lead Rails Evangelist
– [email protected]
– Engine Y., Serious Rails Hosting
– (866) 518-YARD (9273)
Thanks Ezra. Actually, I believe it was your talk at RailsConf that
inspired this. So thanks again! I suspect that, for the short term,
with
only say 10-20 servers per GFS disk it may be ok. If that all works out
and
we scale up then I’m sure we’d go SAN. Thanks for the info on the
contextual symlinks, that’s very cool.
How does GFS handle immense volume, in terms of say you have these 20
servers writing their logs to a single “disk”, and let’s say you’re
being
slashdotted/dug(digged?), so there’s tons of logging (I guess enough to
overwhelm GigE, but I haven’t calculated to see if that’s realistic to
overwhelm in such a case), how does GFS behave?
Obviously I’ll have to test all this, but hoping to short circuit any
insurmountable problems or bad usage, etc.
I presume that for the actual Rails application code, since it
SOA architecture, etc.), I worry about 20 servers all writing to a
ey00-s00070 shared # ls -lsa
to link them:
do it with no san.
gets loaded up once in production mode, that say 20 servers pulling
that from a single GNBD/GFS file system server would be no biggy.
Correct?
Yeah it’s no biggy.
Hate to disagree with one of our own, but you’ll find that the RHCS
has a practical limit of 16 machines per cluster, unless you’re using
the GULM, which is no longer recommended.
At Engine Y. we sidestep this limitation by utilizing a two-tiered
cluster structure, one for the nodes in the cluster, and one for each
customer environment.
Logs - this seems to be the danger area to me. Assuming we have
“high traffic”, and that we do quite a bit of logging (we log a
lot of info for metrics and ability to follow requests through the
SOA architecture, etc.), I worry about 20 servers all writing to a
single log on the one GNBD/GFS server. Valid worry, or? Are there
alternatives I should look at for logging in such an environment?
I’d recommend aggregating the logs via something akin to syslog or
something based upon the wonderful but underutilized Spread library.
Far simpler and more robust.
GFS is great, don’t get me wrong, but I can absolutely guarantee you
that you’ll be disappointed if you put 50 machines into a single RHCS
cluster.
Yes, good point Tom. I had actually meant to switch to syslog logging,
but
hadn’t gotten to it yet. I’ve heard spread is excellent, and we’re
beginning to look at it for a few other things as well, so we’ll see.
As for servers, the 16 limit is interesting. Or rather, somewhat
confounding. I guess they expect you to move to a different system if
you
are managing a lot more servers as part of a cluster, or that you’d do
direct attached storage, etc.?