GFS, logs, and 50+ servers

I’ve seen some talk about GFS on the list in the past. I’m currently
looking at this as a potential solution to deploying Rails apps to 50+
servers. Basically to take advantage of GFS giving you a single
disk/file-system across the servers; to help ensure truly one set of
files
deployed to all servers, faster deploys, etc.

We currently have about 50+ servers, and that will grow. Our
application
architecture is SOA, so in reality one rails app won’t be on all 50
servers,
they’ll be grouped, say 10-20 servers per service.

I am currently eyeing a GFS setup where we use a server (per group) as a
GFS
disk, and GNBD across the machines. So, no SAN, no iSCSI, no fiber,
etc.
It’s what I have available, so balancing the advantage of GFS vs.
deploying
the code to all machines in a more traditional setup.

The servers in this case are 64bit boxes, with dual cores, and GigE
(dual,
but for this discussion assume a single one, since we split the net on
them,
etc.). Also, our application file storage is done using a different
infrastructure, so it doesn’t play into this. Databases are also on
different boxes.

I have not used GFS before, so I’m hoping for some input on some of
these
questions:

  • I presume that for the actual Rails application code, since it gets
    loaded
    up once in production mode, that say 20 servers pulling that from a
    single
    GNBD/GFS file system server would be no biggy. Correct?

  • Logs - this seems to be the danger area to me. Assuming we have “high
    traffic”, and that we do quite a bit of logging (we log a lot of info
    for
    metrics and ability to follow requests through the SOA architecture,
    etc.),
    I worry about 20 servers all writing to a single log on the one GNBD/GFS
    server. Valid worry, or? Are there alternatives I should look at for
    logging in such an environment?

  • Thoughts, comments, notes on this approach in general?


Chris B.
[email protected]

On Jun 7, 2007, at 10:43 PM, Chris B. wrote:

service.
using a different infrastructure, so it doesn’t play into this.
Databases are also on different boxes.

I have not used GFS before, so I’m hoping for some input on some of
these questions:

  • I presume that for the actual Rails application code, since it
    gets loaded up once in production mode, that say 20 servers pulling
    that from a single GNBD/GFS file system server would be no biggy.
    Correct?

Yeah it’s no biggy.

  • Logs - this seems to be the danger area to me. Assuming we have
    “high traffic”, and that we do quite a bit of logging (we log a
    lot of info for metrics and ability to follow requests through the
    SOA architecture, etc.), I worry about 20 servers all writing to a
    single log on the one GNBD/GFS server. Valid worry, or? Are there
    alternatives I should look at for logging in such an environment?

GFS has something called context dependant symlink. This lets you
make symlinks that resolve to a different path based on stuff like
hostname. So you setup a set of directories names after all the
hostnames in the cluster. THen make log a symlink to @hostname, observe:

ey00-s00070 ~ # cd /data/ey/shared/
ey00-s00070 shared # ls -lsa
total 32
4 drwxrwxr-x 7 ez ez 3864 Dec 3 2006 .
4 drwxr-xr-x 4 ez ez 3864 Dec 7 15:14 …
4 drwxrwxrwx 2 ez ez 3864 Jun 1 13:30 ey00-s00070
4 drwxrwxrwx 2 ez ez 3864 Jun 4 22:07 ey00-s00071
4 lrwxrwxrwx 1 ez ez 9 Dec 3 2006 log -> @hostname

See how log is a symlink to @hostname? After you make a directory
namesd after all your hostnames that share a filesystem, you do this
to link them:

$ ln -s @hostname log

  • Thoughts, comments, notes on this approach in general?

I have many many nodes running sharing GFS filesystems and it works
great in general, much more robust then NFS. I do it all off of a SAN
network though so I have no experience with the way you are trying to
do it with no san.

Cheers-

– Ezra Z.
– Lead Rails Evangelist
[email protected]
– Engine Y., Serious Rails Hosting
– (866) 518-YARD (9273)

Thanks Ezra. Actually, I believe it was your talk at RailsConf that
inspired this. So thanks again! I suspect that, for the short term,
with
only say 10-20 servers per GFS disk it may be ok. If that all works out
and
we scale up then I’m sure we’d go SAN. Thanks for the info on the
contextual symlinks, that’s very cool.

How does GFS handle immense volume, in terms of say you have these 20
servers writing their logs to a single “disk”, and let’s say you’re
being
slashdotted/dug(digged?), so there’s tons of logging (I guess enough to
overwhelm GigE, but I haven’t calculated to see if that’s realistic to
overwhelm in such a case), how does GFS behave?

Obviously I’ll have to test all this, but hoping to short circuit any
insurmountable problems or bad usage, etc.

On 6/7/07, Ezra Z. [email protected] wrote:

deploys, etc.
traditional setup.

  • I presume that for the actual Rails application code, since it
    SOA architecture, etc.), I worry about 20 servers all writing to a
    ey00-s00070 shared # ls -lsa
    to link them:
    do it with no san.


Chris B.
[email protected]

On Jun 7, 11:08 pm, Ezra Z. [email protected] wrote:

On Jun 7, 2007, at 10:43 PM, Chris B. wrote:

gets loaded up once in production mode, that say 20 servers pulling
that from a single GNBD/GFS file system server would be no biggy.
Correct?

Yeah it’s no biggy.

Hate to disagree with one of our own, but you’ll find that the RHCS
has a practical limit of 16 machines per cluster, unless you’re using
the GULM, which is no longer recommended. :slight_smile:

At Engine Y. we sidestep this limitation by utilizing a two-tiered
cluster structure, one for the nodes in the cluster, and one for each
customer environment.

  • Logs - this seems to be the danger area to me. Assuming we have
    “high traffic”, and that we do quite a bit of logging (we log a
    lot of info for metrics and ability to follow requests through the
    SOA architecture, etc.), I worry about 20 servers all writing to a
    single log on the one GNBD/GFS server. Valid worry, or? Are there
    alternatives I should look at for logging in such an environment?

I’d recommend aggregating the logs via something akin to syslog or
something based upon the wonderful but underutilized Spread library.

Far simpler and more robust.

GFS is great, don’t get me wrong, but I can absolutely guarantee you
that you’ll be disappointed if you put 50 machines into a single RHCS
cluster.


– Tom M., CTO
– Engine Y., Inc.

I was also going to recommend syslog or something similar. There was
a post recently about it here:

http://toolmantim.com/article/2007/6/6/logging_rails_to_syslog_with_sysloglogger

On Jun 8, 11:06 am, “[email protected][email protected]

Yes, good point Tom. I had actually meant to switch to syslog logging,
but
hadn’t gotten to it yet. I’ve heard spread is excellent, and we’re
beginning to look at it for a few other things as well, so we’ll see.

As for servers, the 16 limit is interesting. Or rather, somewhat
confounding. I guess they expect you to move to a different system if
you
are managing a lot more servers as part of a cluster, or that you’d do
direct attached storage, etc.?

On 6/8/07, [email protected] [email protected] wrote:

Databases are also on different boxes.

lot of info for metrics and ability to follow requests through the
that you’ll be disappointed if you put 50 machines into a single RHCS
cluster.


– Tom M., CTO
– Engine Y., Inc.


Chris B.
[email protected]