I am using nginx + php-fpm (quad-core PC) + MySQL (on another PC) and
facing an interesting issue:
after some time nginx is running, system “load average” goes up
(sometimes up to 30 and even more points) while CPU usage is still less
then 30% With such a high load system becomes very slow and
unresponsive. Restarting nginx helps, but only for a little while. Any
ideas on the matter?
I am using nginx + php-fpm (quad-core PC) + MySQL (on another PC) and
facing an interesting issue:
after some time nginx is running, system “load average” goes up
(sometimes up to 30 and even more points) while CPU usage is still
less then 30% With such a high load system becomes very slow and
unresponsive. Restarting nginx helps, but only for a little while. Any
ideas on the matter?
You fail to mention what sort of disk subsystem you are using. High
load is often indicative of I/O bottlenecks. You also don’t mention
what OS you are using, but if it’s Linux you should install iotop and
take a look at what process is driving the load (it might be Nginx, but
more likely MySQL - restarting Nginx simply cancels whatever pending
requests MySQL had is my guess).
You fail to mention what sort of disk subsystem you are using.
One SCSI disk
High load is often indicative of I/O bottlenecks.
You also don’t mention what OS you are using,
CentOS release 5.4
you should install iotop and take a look at what process is driving the load
Here is the snippet from iotop output
Total DISK READ: 3.33 K/s | Total DISK WRITE: 3.27 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
20264 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 99.99 % php-cgi
–fpm --fpm-config /etc/php-fp
21604 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 99.99 % php-cgi
–fpm --fpm-config /etc/php-fp
21605 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 99.99 % php-cgi
–fpm --fpm-config /etc/php-fp
20054 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 99.99 % php-cgi
–fpm --fpm-config /etc/php-fp
20902 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 99.99 % php-cgi
–fpm --fpm-config /etc/php-fp
20275 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 99.99 % php-cgi
–fpm --fpm-config /etc/php-fp
20828 be/4 nobody 0.00 B/s 0.00 B/s 0.00 % 99.99 % php-cgi
–fpm --fpm-config /etc/php-fp
Looks like php-cgi makes 99.99% IO, but not really sure what this means
(it might be Nginx, but more likely MySQL - restarting Nginx simply cancels whatever pending
requests MySQL had is my guess).
MySQL is running on another PC, so php makes network connections to DB.
It may take a while, of course, but should not php being blocked and
sleeping while network system is processing request?
Here is the snippet from iotop output
Looks like php-cgi makes 99.99% IO, but not really sure what this means
This means that php-cgi is spending 99.99% of its time in I/O, not that
it’s creating 99.99% of your I/O. It means php-cgi is waiting almost
all the time.
(it might be Nginx, but more likely MySQL - restarting Nginx simply
cancels whatever pending
requests MySQL had is my guess).
MySQL is running on another PC, so php makes network connections to
DB. It may take a while, of course, but should not php being blocked
and sleeping while network system is processing request?
Yes, and this is certainly what is happening. You have to remember
that “load” is an indication of how many processes are waiting to be
scheduled, and this includes processes that are waiting on I/O. In this
case it looks like PHP isn’t generating a lot of I/O, rather it is waiting, creating a high load, and at the same time, you are running
out of PHP processes to handle new requests, which is why the box slows
down.
I’d follow these steps:
check the state of the MySQL server. See if this machine is
overloaded.
check MySQL itself. Turn on query logging and see if you have some
query that’s taking a long time to complete.
take a look at your network to make sure that there isn’t an issue
there (saturation, packet loss, etc).
This means that php-cgi is spending 99.99% of its time in I/O, not that
it’s creating 99.99% of your I/O. It means php-cgi is waiting almost
all the time.
This is reasonable and fully corresponds to the PHP script action
pattern: gather data for quering MySQL, issue prepared query and wait
for results
Taking into account that MySQL is running on another box, PHP is waiting
almost all the time for results.
Yes, and this is certainly what is happening.
You have to remember that “load” is an indication of how many processes are waiting to be
scheduled, and this includes processes that are waiting on I/O.
In this case it looks like PHP isn’t generating a lot of I/O, rather it is waiting, creating a high load,
Agreed, “load” indicates average number of processes that are running,
runnable or in uninterruptible sleep.
PHP waiting for netwrok responce must be in uninterruptible sleep state,
than (it is must not be running and doubtly runnable).
So, waiting PHP processes can make “load average” very high.
and at the same time, you are running out of PHP processes to handle new requests,
Correct me if I am wrong, but when nginx lacks ready PHP process “11:
Resource temporarily unavailable while connecting to upstream” message
is put into error.log
Have seen plenty of these during initial system setup, but after adding
php-fpm children all goes well: no such messages in logs and no “5xx
error” pages reported by users for a long time.
I am monitoring logs carefully. So I suppose that all is OK with PHP
processes number.
which is why the box slows down.
not sure yet
I’d follow these steps:
check the state of the MySQL server. See if this machine is overloaded.
MySQL box is fine and running, no excessively high load or any type of
slowing down
check MySQL itself. Turn on query logging and see if you have some
query that’s taking a long time to complete.
There are several queries that can take up to several seconds to
complete, looking into optimizations
take a look at your network to make sure that there isn’t an issue
there (saturation, packet loss, etc).
Can you give an advice on how to check saturation of the link between
these two boxes?
Boxes are standing close to each other and network transmission looks
like fine as well.
May be I am missing something, but I see the general overview of the
task as following:
Machine A hosts N processes waiting for network reply. (Blocked on
recv(), I assume)
Machine B is a “Black Box”, running something which produces results for
machine A. At the moment it doesn’t matter what it is and how long it
takes to produce a result. Let’s assume time T.
Increasing number of waiting processes on machine A slows it down.
Sounds strange, though. Thought that sleeping process consumes memory
only and doesn’t impact performance. May be it is something related to
task scheduling? Is big number of sleeping process impacts performance
and/or slows down scheduler?
I am using nginx + php-fpm (quad-core PC) + MySQL (on another PC) and facing an interesting issue:
after some time nginx is running, system “load average” goes up (sometimes up to 30 and even more points) while CPU usage is still less then 30% With such a high load system becomes very slow and unresponsive. Restarting nginx helps, but only for a little while. Any ideas on the matter?
You should check CPU%, Disk IO% and probable look for
operations on the wait queue. sysrec does this on Solaris,
Im working to port the recorder on Linux, FreeBSD.
You should somehow count for yourself:
CPU # usr + sys time across all CPUs
Memory # free RAM. freemem from availrmem
Disk # %busy. r+w times across all Disks
Network # throughput. r+w bytes across all NICs
Saturation,
CPU # threads on the run queue
Memory # scan rate of the page scanner
Disk # operations on the wait queue
Network # errors due to buffer saturation
Load average is a metric which might not really tell you a very useful
image
unless you really know what is means and how to interpret that. Check it
out:
as already indicated your php-cgi processes are waiting IO completion
from other sources: database backend, other backends if you use. This
might indicate why you have a high percentage of iowait. You gotta
drill down and find whats going on:
webserver - database connection:
check network link, speed, full duplex
check TCP/sec: connection ESTABLISHED to understand whats your load
check TCP/IP settings on Linux, both machines: web and db servers
vmstat: would as well give you an overall picture of cpu/mem
what does it say on your box: vmstat 1 10 ? Run this as well
on your DB server.
check free column
check si, so
You should check CPU%, Disk IO% and probable look for
operations on the wait queue. sysrec does this on Solaris,
Im working to port the recorder on Linux, FreeBSD.
You should somehow count for yourself:
I am using sar tool at the moment and unfortunately not that experienced
to migrate SDR from Solaris myself
Here is snippet from sar output for typical situation when the system
slows down:
CPU %user %nice %system %iowait %steal %idle
all 7.92 0.00 7.75 46.93 0.00 37.40
all 8.43 0.00 8.31 48.12 0.00 35.14
all 8.03 0.00 8.01 45.84 0.00 38.12
all 8.39 0.00 8.43 46.48 0.00 36.69
all 9.89 0.11 7.32 54.83 0.00 27.85
all 9.09 0.00 8.87 49.75 0.00 32.30
all 8.96 0.00 9.09 48.89 0.00 33.06
all 10.02 0.00 9.60 51.23 0.00 29.15
Disk IO, CPU% and look for signs of saturation.
cpuplayer (is your machine spending more time in SYS/USER ?)
useful to collect longer trends in order to see whats going on
From this table we can tell that CPU is not completely used, there is about 30% idle, user and system CPU consumptions are almost equal (is it good or bad? what usage ratio is expected?) %iowait looks like high (relatively to other metrics) but I am not quite sure what does it mean.
Does anybody see something leading to slowness in these metrics?
we have nginx -> php + fastcgi -> db … others
vmstat tells us that there are some processes waiting for IO
, b column, and most likely these are the connections mapped to
the fastcgi process. So nginx passed the requests to the
fcgi process (php-cgi etc…) and then fcgi will have do to
its work.
Could this be a problem of the php/fcgi process(es) running on that
box ? I mean the php-cgi which has nothing to do with the nginx
server. Configuration, settings of the FCGI/PHP module ?
We have about 540Mb free memory and small si activity (there is only one 32K page reading presented in snipped, but actually I’ve seen several more 32K reads). As expected plenty of processes in “uninterraptible sleep” state. Looks like enough free resourses (mem, CPU), but system is very slow. I am beginning to think that processes in “uninterruptible sleep” state are driving system to have scheduling lag. Looks for me like they do not consume CPU, but slows down task scheduler. Can it be an issue?
Good is to have latest patches applied etc. To me this is a good
candidate for
DTrace (Solaris) or SystemTap (linux). Probable check php/cgi settings,
nginx
settings for fcgi, linux kernel patches, if any. As well try to
experiment with
a much smaller application first. See if you observe same thing (good to
have a
test env where you could reproduce these sort of things)
vmstat: would as well give you an overall picture of cpu/mem
what does it say on your box: vmstat 1 10 ?
Run this as well
on your DB server.
check free column
check si, so
We have about 540Mb free memory and small si activity (there is only one
32K page reading presented in snipped, but actually I’ve seen several
more 32K reads). As expected plenty of processes in “uninterraptible
sleep” state. Looks like enough free resourses (mem, CPU), but system is
very slow. I am beginning to think that processes in “uninterruptible
sleep” state are driving system to have scheduling lag. Looks for me
like they do not consume CPU, but slows down task scheduler. Can it be
an issue?
This means that php-cgi is spending 99.99% of its time in I/O, not that
it’s creating 99.99% of your I/O. It means php-cgi is waiting almost
all the time.
This is reasonable and fully corresponds to the PHP script action pattern: gather data for quering MySQL, issue prepared query and wait for results
Taking into account that MySQL is running on another box, PHP is waiting almost all the time for results.
Well, all it means is that the bottleneck lies outside PHP. I wouldn’t
read too much else into it. It just means that 99% of your slowness is
elsewhere.
and at the same time, you are running out of PHP processes to handle new requests,
Correct me if I am wrong, but when nginx lacks ready PHP process “11: Resource temporarily unavailable while connecting to upstream” message is put into error.log
Have seen plenty of these during initial system setup, but after adding php-fpm children all goes well: no such messages in logs and no “5xx error” pages reported by users for a long time.
I am monitoring logs carefully. So I suppose that all is OK with PHP processes number.
Maybe. Or maybe you’ve simply disguised the problem by throwing more
processes at it. How many PHP processes are you running? Can you
provide your php-fpm parameters? Also, what’s an approximation of your
requests per second during these peak times?
which is why the box slows down.
not sure yet
I’d follow these steps:
check the state of the MySQL server. See if this machine is overloaded.
MySQL box is fine and running, no excessively high load or any type of slowing down
Again, load isn’t always indicative of “speed”. How are you sure it’s
responding quickly? Have you compared the times of queries during
periods of load vs idle times?
check MySQL itself. Turn on query logging and see if you have some
query that’s taking a long time to complete.
There are several queries that can take up to several seconds to complete, looking into optimizations
Going out on a limb: would it be possible to temporarily replace these
slow queries with something faster? That is, drop the query results
into a table and use that in place of the actual query?
I’d be curious to see the output of iotop on your MySQL server as well.
Machine B is a “Black Box”, running something which produces results
for machine A. At the moment it doesn’t matter what it is and how long
it takes to produce a result. Let’s assume time T.
I cannot possibly fathom how the length of T doesn’t matter. To me
that is probably the single most important data point to investigate.
The entire response time directly depends on T and you say it “doesn’t
matter”?
Increasing number of waiting processes on machine A slows it down.
Why do you think machine A is slow? Machine A waits for machine B and
you blame machine A. I don’t see how you arrive at this conclusion.
To me it seems it could be either machine (or the connection between the
machines) and more testing needs to be done to arrive at such a
conclusion.
I’d be more interested in how T varies as number of sleeping processes
increases.
Sounds strange, though. Thought that sleeping process consumes memory
only and doesn’t impact performance. May be it is something related to
task scheduling? Is big number of sleeping process impacts performance
and/or slows down scheduler?
Not since kernel 2.6, AFAIK. How many processes are we talking about
here?
As an aside, one thing that you mentioned earlier that I was wondering
about: what is writing to the local disk at 3.27MB/s (from iotop
output)?
We can say 1157 HTTP sessions are established (and may be kept alive)
nicstat
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util
Sat
10:24:02 lo 0.33 0.33 290.8 290.8 1.16 1.16 0.00
0.00
10:24:02 eth0 0.29 0.79 1183.8 1448.1 0.25 0.56 0.00
0.00
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util
Sat
10:24:03 lo 153.7 153.7 338.6 338.6 464.7 464.7 0.00
0.00
10:24:03 eth0 1070.9 768.7 3599.4 3987.9 304.7 197.4 0.00
0.00
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util
Sat
10:24:04 lo 65.22 65.22 191.9 191.9 348.1 348.1 0.00
0.00
10:24:04 eth0 932.5 691.3 3250.8 3437.7 293.7 205.9 0.00
0.00
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util
Sat
10:24:05 lo 522.5 522.5 687.5 687.5 778.2 778.2 0.00
0.00
10:24:05 eth0 1075.6 988.8 3758.0 4308.4 293.1 235.0 0.00
0.00
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util
Sat
10:24:06 lo 235.0 235.0 551.9 551.9 436.0 436.0 0.00
0.00
10:24:06 eth0 855.4 857.6 3476.3 3761.2 252.0 233.5 0.00
0.00
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util
Sat
10:24:07 lo 135.3 135.3 351.9 351.9 393.6 393.6 0.00
0.00
10:24:07 eth0 1021.0 1025.4 3970.6 4591.4 263.3 228.7 0.00
0.00
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util
Sat
10:24:08 lo 168.1 168.1 492.3 492.3 349.6 349.6 0.00
0.00
10:24:08 eth0 793.6 864.8 3340.2 3875.5 243.3 228.5 0.00
0.00
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util
Sat
10:24:09 lo 32.88 32.88 78.97 78.97 426.4 426.4 0.00
0.00
10:24:09 eth0 551.1 462.1 2416.2 2461.1 233.6 192.3 0.00
0.00
Good is to have latest patches applied etc. To me
this is a good candidate for DTrace (Solaris) or SystemTap (linux).
getting acquainted with SystemTap, looks like not “5 minutes to learn”
system
Cliff W. Wrote:
I’ve arranged answers regarding to questions grouped by “theme”
Well, all it means is that the bottleneck lies outside PHP.
I wouldn’t read too much else into it.
It just means that 99% of your slowness is elsewhere.
I cannot possibly fathom how the length of T doesn’t matter.
To me that is probably the single most important data point to investigate.
The entire response time directly depends on T and you say it “doesn’t matter”?
Why do you think machine A is slow?
Machine A waits for machine B and you blame machine A.
I don’t see how you arrive at this conclusion.
To me it seems it could be either machine (or the connection between the machines)
and more testing needs to be done to arrive at such a conclusion.
Combined answer on all these questions. Looks like we are talking about
not the same thing in here. You are talking about response time for
network request issued by user. It really consists of several stages of
processing in here. Like
HTTP request handled by nginx
data processing by PHP (provided via CGI interface with php-fpm in my
case)
external DB request processing (aka T in our discussion)
response to user
and in this case I totally agree with you - time T does matter and
directly influences response time.
What I am talking about is a little bit different. In peak hours
response time degrades significantly, but is still more or less
acceptable, but what is unacceptable is that machine A slows down and
replies for external actions (like SSH login, VPN connection) very
slowly. For example, I sometimes even can’t establish VPN connection to
it due to timeouts. (there is openvpn server running on it). That’s why
I am talking about “slow machine A” and blame it. That’s why I am
worried about “uninterruptible sleep” processes and thinking about
scheduling lag
I’d be curious to see the output of iotop on your
MySQL server as well.
here it is:
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
23144 be/4 mysql 0.00 B/s 19.10 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
880 be/4 mysql 0.00 B/s 11.46 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
1247 be/4 mysql 0.00 B/s 11.46 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
1287 be/4 mysql 0.00 B/s 53.49 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
1305 be/4 mysql 0.00 B/s 22.92 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
7929 be/4 mysql 0.00 B/s 15.28 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
30493 be/4 mysql 0.00 B/s 15.28 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
10222 be/4 mysql 0.00 B/s 7.64 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init [3]
2 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.00 %
3 rt/3 root 0.00 B/s 0.00 B/s 0.00 % 0.00 %
and another sample
Total DISK READ: 0.00 B/s | Total DISK WRITE: 107.53 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
709 be/3 root 0.00 B/s 55.62 K/s 0.00 % 45.85 %
4129 be/4 mysql 0.00 B/s 3.71 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
4147 be/4 mysql 0.00 B/s 3.71 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
4233 be/4 mysql 0.00 B/s 7.42 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
4273 be/4 mysql 0.00 B/s 7.42 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
6587 be/4 mysql 0.00 B/s 14.83 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
15008 be/4 mysql 0.00 B/s 3.71 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
7449 be/4 mysql 0.00 B/s 11.12 K/s 0.00 % 0.00 % mysqld
–basedir=/usr~/lib/mysql/mysql.sock
looks like nothing special for me. MySQL is running, not much I/O
May be it is something related to task scheduling? Is big number of sleeping process impacts performance and/or slows down scheduler?
Not since kernel 2.6, AFAIK. How many processes are we talking about here?
nginx worker_processes 4
php-pfm max_children 1000 total
and system processes of course, but nothing much
Maybe. Or maybe you’ve simply disguised the
problem by throwing more processes at it. How many PHP processes are you
running? Can you provide your php-fpm parameters? Also, what’s an
approximation of your requests per second during these peak times?
2 pools with 500 php-fpm children each and something like 1000-2000
concurrent HTTP sessions
As an aside, one thing that you mentioned earlier that I was wondering
about: what is writing to the local disk at 3.27MB/s (from iotop output)?
What I am talking about is a little bit different. In peak hours
response time degrades significantly, but is still more or less
acceptable, but what is unacceptable is that machine A slows down and
replies for external actions (like SSH login, VPN connection) very
slowly. For example, I sometimes even can’t establish VPN connection
to it due to timeouts. (there is openvpn server running on it). That’s
why I am talking about “slow machine A” and blame it. That’s why I am
worried about “uninterruptible sleep” processes and thinking about
scheduling lag
Ah, I did not gather this from your previous explanations. Yes, this
is much different than what I was attempting to solve.
Have you done any tuning of your system to handle high loads or is this
a default CentOS install?
You are talking about a system slowdown caused by your current
workload. This might be caused by a series of things some related
to the kernel. But most likely analysing with SystemTap whats going
on here might help. Thats why I keep telling people DTrace is like a gold mine
or a step in the future. Since you are not on Solaris you need to
start looking into SystemTap. If possible, have a box with Solaris
or FreeBSD next running this workload and check with DTraceToolkit.
I am started paying with SystemTap and FreeBSD is coming next in line.
thanks for advice.
Cliff W. Wrote:
Have you done any tuning of your system to handle
high loads or is this a default CentOS install?
no any patches and only small tuning for FS and Network options
What I am talking about is a little bit different. In peak hours response time degrades significantly, but is still more or less acceptable, but what is unacceptable is that machine A slows down and replies for external actions (like SSH login, VPN connection) very slowly. For example, I sometimes even can’t establish VPN connection to it due to timeouts. (there is openvpn server running on it). That’s why I am talking about “slow machine A” and blame it. That’s why I am worried about “uninterruptible sleep” processes and thinking about scheduling lag
You are talking about a system slowdown caused by your current
workload. This might be caused by a series of things some related
to the kernel. But most likely analysing with SystemTap whats going
on here might help. Thats why I keep telling people DTrace is like a
gold mine
or a step in the future. Since you are not on Solaris you need to
start looking into SystemTap. If possible, have a box with Solaris
or FreeBSD next running this workload and check with DTraceToolkit.
I am using nginx + php-fpm (quad-core PC) + MySQL (on another PC) and
facing an interesting issue:
after some time nginx is running, system “load average” goes up
(sometimes up to 30 and even more points) while CPU usage is still less
then 30% With such a high load system becomes very slow and
unresponsive. Restarting nginx helps, but only for a little while. Any
ideas on the matter?
php-fpm has a slow log feature. You can configure it in php-fpm.conf,
and check the php-fpm slow log later. In the slow log, php-fpm will tell
us where are the slow parts of php scripts.
If we see mysql_query(), then it means that sql is slow. We should check
the mysql slow log. May be we can use mysql slow analyse tool to figure
out the slow queries.
If we see file_get_contents(), then it means there is something wrong
with the file system, maybe you are reading files from net file systems
like NFS.
I don’t think there’s any network problems, it looks much more like your
system is running out of disk IO bandwidth. There’s a continuous stream
of
data being sent to disk (bo). The actual rate (~4000 blocks/s) isn’t
that
high if it’s purely linear write, but my guess is you’re getting quite
a
bit of random IO going on. That’s causing lots of processes to block.
What’s probably also happening is that your sshd and vpn server are
being
swapped out (swapped = 1.1G, free = 540M), so when you try and connect,
they
have to be swapped back in. Because of all the IO being pushed to disk,
the
reads are starved, and swapping them in takes a while, causing your
connections to time out.
Basically the question to answer, why and what are your php processing
writing so much to local disk continuously for?