NFS and Rails - whats the deal?


#1

So for page caching across a cluster, or static resources in general
w/ a rails cluster, is NFS a bad idea? I’ve read a lot of posts where
folks talk about how unreliable NFS has been in the past, and it has
issues with locking and what not. My company has had issues as well
when servers are restarted and have issues reconnecting to the NFS
mount.

I wouldn’t mind page caching on each web server and just duplicating
data, if it weren’t for the problem of expiring data across machines.

The environment is RHEL 4, at least a couple web servers plus a DB
server and a few others. Fairly standard Mongrel/Apache22/Mysql
deployment.

Any ideas/experiences? thanks…

  • Rob


http://robsanheim.com


#2

Hi,
I deployed a new Rails site earlier in the week and was looking at the
production log this morning and the completion time for every request
is:

Completed in 0.00010 (10000 reqs/sec) | Rendering: -0.00000 (-4%) | DB:
0.00000 (0%)

This wasn’t the case yesterday, “proper” completion times were being
shown, varying from 10 reqs a second to 150+.

I’m using fragment caching extensively but there are a number of
non-cached pages which definitely should be logging database and
rendering activity.

My deployment environment is Debian, Apache -> Mongrel (5 processes),
Rails 1.2.2, Ruby 1.8.5.

Can anybody shed any light on why this might be happening?

Thanks in advance,

Niall M.
Abecto Design
http://www.abecto.com/


#3

On Apr 5, 1:34 am, “Rob S.” removed_email_address@domain.invalid wrote:

The environment is RHEL 4, at least a couple web servers plus a DB
server and a few others. Fairly standard Mongrel/Apache22/Mysql
deployment.

Any ideas/experiences? thanks…

At Engine Y. we use a clustered filesystem, GFS, and storing the
data on remote disk bank via AoE.

You could choose iSCSI, but it’s slower, and Fiber channel is
fantastic if you can afford it. :slight_smile:

This has the enormous advantage of page cache coherence, full POSIX
file semantics including locking, and can be fully fault tolerant.
Disk access between two nodes is identical to disk access between two
processes on the same node.

Performance is a lot better than NFS, particularly so when you have
fast disks and big pipes to the disks, as you don’t have share the
network interface(s) with disk I/O.

There are some potential disadvantages of a system like this. It
fundamentally violates shared nothing, which means you cannot scale to
infinity with this, but it can likely scale to somewhere around 95th
percentile of all internet websites…

Somewhere around 95th percentile, all bets are off no matter how you
start, so I don’t consider this a real problem. :slight_smile:


– Tom M., CTO
– Engine Y., Ruby on Rails Hosting
– Reliability, Ease of Use, Scalability
– (866) 518-YARD (9273)


#4

On Thu, 5 Apr 2007 03:34:50 -0500
“Rob S.” removed_email_address@domain.invalid wrote:

So for page caching across a cluster, or static resources in general
w/ a rails cluster, is NFS a bad idea? I’ve read a lot of posts where
folks talk about how unreliable NFS has been in the past, and it has
issues with locking and what not. My company has had issues as well
when servers are restarted and have issues reconnecting to the NFS
mount.

In most of the cases I’ve seen people using NFS they tend to run into
problems either with reliability or with overloading traffic. On the
reliability end, they expect NFS to work transactionally for reads and
writes of entire files, or expect the locks to work flawlessly. With
overloading traffic it seems they expect NFS to take less traffic than
HTTP (partially true) but then slam tons of file reads/writes over NFS
and HTTP at the same time. In some cases I’ve seen people effectively
send the contents of a file 3-4 times over the network for each single
HTTP request.

I’m sure other people could chime in with their setups, but the only
configuration I’ve seen work reasonably well is the following:

  1. Setup a “static server” that has an NFS mountable and writable
    directory.
  2. This static server then has a fast as hell HTTP server (nginx works
    well) that reads the files in this mountable directory straight off the
    disk NOT OFF NFS.
  3. All of the rails backends then write to the NFS mount to put the
    asset onto the static server, and write their URLs to point at
    this new static asset URL instead of the rails URL or do a redirect.

This works best for assets that you can classify and which aren’t going
to change often. The key is that you’ve written the asset fully BEFORE
you send the response that the client uses to read the asset. If you
try to do it in parallel then the client could end up reading a
partially written asset.

There’s also problems with this NFS usage potentially stopping a
Mongrel server if NFS has problems writing the file. I actually don’t
advise NFS, so you can also try other options like Samba, Lustre, CODA,
etc.


Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://www.awprofessional.com/title/0321483502 – The Mongrel Book
http://mongrel.rubyforge.org/


#5

On Thu, 05 Apr 2007 08:20:14 -0700
“removed_email_address@domain.invalid” removed_email_address@domain.invalid wrote:

There are some potential disadvantages of a system like this. It
fundamentally violates shared nothing, which means you cannot scale to
infinity with this, but it can likely scale to somewhere around 95th
percentile of all internet websites…

That’s another point I forgot to make: this should be a last resort.
I hate the word “should” but really, if you haven’t tried to tune up a
basic shared-nothing design to the fastest you can before running
toward insanely complex caching designs then you’ve gone in the wrong
direction.

Many, many times I’ve seen simple little changes and tweaks with bits
of strategic rework and caching boost a site that had a good initial
shared-nothing design way beyond what a heavy shared cache site could
pull off.

In other words, no amount of distributed caching can help a poorly
designed system.


Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu
http://www.zedshaw.com/
http://www.awprofessional.com/title/0321483502 – The Mongrel Book
http://mongrel.rubyforge.org/


#6

On 4/5/07, Zed A. Shaw removed_email_address@domain.invalid wrote:

I hate the word “should” but really, if you haven’t tried to tune up a
basic shared-nothing design to the fastest you can before running
toward insanely complex caching designs then you’ve gone in the wrong
direction.

I agree, complex is bad and should be the last thing you worry about.
But it seems like if you want to do decent page caching in a clustered
rails setup, you have two options, both of which are complex in their
own way:

  • just page cache on each web server, and then handle expiration by
    kicking off jobs across the network on the other machines to remove
    stale content. So each server has its own copy of the cache, and we
    keep things in sync through some annoying but simple scripts. Or if
    we dont care too much about stale content, remove the page cache on a
    time based schedule across the servers…if its short enough our
    visitors won’t notice or won’t care.

  • share a single page cache across the cluster via NFS/GFS/etc, which
    adds all complexity and issues you guys have discussed.

Or just say screw it, and use action/fragment caching in memcached.

Many, many times I’ve seen simple little changes and tweaks with bits
of strategic rework and caching boost a site that had a good initial
shared-nothing design way beyond what a heavy shared cache site could
pull off.

Definitely - I’m amazed how many sites aren’t using any http
compression yet, even w/ mod_deflate being core apache for years now.
I also wonder how many sites are still using the default apache conf
file, with all those unneeded modules loaded…

  • Rob


http://robsanheim.com