Weird timeouts, not sure if I've set the right threshholds

mike · May 3, 2008, 11:16am

On Sat, May 03, 2008 at 03:14:05AM -0400, Denis S. Filimonov wrote:

doesn’t have this issue since it typically works over UDP.
That’s only true under the assumption of empty IO queue.

At any rate, assuming the NFS server has the same disk latency, it adds the
network latency on top of that (CPU time is negligible compared to disk
seek). Roundtrip on a lightly loaded properly configured network takes under
1ms, i.e. an order of magnitude lower than a disk seek.

I agree, but any packet loss, retransmission, etc. will affect whole
worker.

Also, as I understand, in modern Linux it’s not easy to find the cause
of stall: in FreeBSD in top/ps you will see that a process waits on
“nfsrcv” WCHAN or so. Probably, modern NFS in FreeBSD became much better
(I saw many Yahoo!, Isilon, etc. developers NFS commits), but in one old
setup (FreeBSD 4 as client and FreeBSD 3 as server), I saw many Apache
stalls after some NFS things go wrong.

mike · May 3, 2008, 9:27am

On Saturday 03 May 2008 02:04:37 Igor S. wrote:

I do have a couple boxes serving a lot of traffic (mostly PHP) from NFS.
It works just fine, though it did take some NFS tuning.

All filesystems read operations are blocking operations, i.e. if file page
is not in VM cache, a process must wait for it. The only exception are
aio_read(), but it has its own drawbacks. Local filesystem with non-faulty
disks has constant blocking time: about 10-20ms, seek time. NFS may block
longer.

That’s only true under the assumption of empty IO queue.

At any rate, assuming the NFS server has the same disk latency, it adds
the
network latency on top of that (CPU time is negligible compared to disk
seek). Roundtrip on a lightly loaded properly configured network takes
under
1ms, i.e. an order of magnitude lower than a disk seek.

And blocked nginx worker can not handle other its connections, those
can be handled fast from VM cache/etc. You do not see it in PHP case,
because each PHP process handles the single connection at the same time.

That’s true, however one can increase the number of workers by the
amount of
latency increase to archive the same level of concurrency, in this case
it’s
only 10%. That’s really not a problem.

The problem with NFS happens when all necessary data blocks are cached:
a
local FS would just happily return the cached data without accessing the
disk
while NFS client still issues a request to see if the file has changed.
Thus,
NFS tends to flood network with tiny requests and that’s the cause of
its
slowness. My point is that in most cases it can be easily prevented by
relaxing cache coherency protocol without sacrifying safety.

mike · May 4, 2008, 1:27am

I tried to set it up once and I got confused, and lazy. Basically, it
sucks to do any switches, because depending on the type of switch,
it’s downtime for my clients. They’re footing the bill for this setup
too…

As I’ve said, I’m going to try FreeBSD, it seems to have a better
memory/IO/VM subsystem, stronger NFS too. (I say better due to
people’s complaints on other mailing lists with the Linux scheduler
right now and other things, how it swaps and allocates memory, etc)

Ideally the applications I host will be smart enough to use something
more distributed with less overhead (or at least capable of spreading
the overhead among a lot of nodes) - pNFS might be cool, some day…

mike · May 4, 2008, 1:10am

On Fri, 2 May 2008 20:17:20 -0700
mike [email protected] wrote:

I am -not- happy about NFS but at the moment I don’t see any options
for removing it without changing all my clients to use some other
application-aware solution like MogileFS, etc. I’ve looked at AFS/Coda
and some other things.

Any reason you don’t want to use AFS? The read caching should be much
more aggressive than NFS, and you have the option of a memory-only
cache.