Hi,
Unfortunately the changes did not seem to be the silver bullet I was
hoping
for - however - although the processes are still sleeping, and the
servers
are still getting very high loads, it does seem to have helped -
downloads
are no longer unable to start, which is great!
I’m not sure if it’s helpful, but an strace on a sleeping process looks
like this:
io_submit(47044328767488, 1, {{0x13dab860, 0, 0, 0, 66}}) = 1
epoll_wait(9, {{EPOLLOUT, {u32=1554189657, u64=47044330982745}},
{EPOLLOUT,
{u32=1554179536, u64=47044330972624}}}, 512, 36) = 2
writev(250,
[{"\321\4\vS\0313\237F\222\337\246\322\33(\30\245=g\352\271\2\361\244p\240\377Q\314\2371^\256"…,
161636}], 1) = 20440
writev(69,
[{"\235\337}\33\201\214)\321\343\233\22\346$z\374\2126T\j\210\250L\250\331{\220\333\323\343\341J"…,
386840}], 1) = 130320
epoll_wait(9, {{EPOLLOUT, {u32=1554194624, u64=47044330987712}}}, 512,
9) =
1
writev(222,
[{"<\247\260P\237\2455\236\244\352!\237s\223h\25\206N3[{\351f\31\275\344b\5\204\f\v"..., 396976}], 1) = 95568 epoll_wait(9, {{EPOLLOUT, {u32=1554205480, u64=47044330998568}}}, 512, 26) = 1 writev(286, [{"E\260q\214\346X[\376\305\5\275\352\344
\256q\327\344m\r\236\t\321\354\200\325\333\351E\340\374\232"…,
240116}], 1) = 64240
epoll_wait(9, {{EPOLLOUT, {u32=1554195361, u64=47044330988449}}}, 512,
25)
= 1
writev(133,
[{"\243Y\373y\10\0252\34\32\22\2\36\227\325e\345\333\372=\365./\340\34V\251U\0373\234\35\250"…,
13732}], 1) = 13732
io_submit(47044328767488, 1, {{0x12599628, 0, 0, 0, 209}}) = 1
epoll_wait(9, {{EPOLLOUT, {u32=1554199961, u64=47044330993049}}}, 512,
5) =
1
writev(49,
[{"+\347^\17\322\354\201\20=\35\246b\200\0\214K’z>\344k\331\272Svh\234`\334)|\205"…,
176592}], 1) = 84120
epoll_wait(9, {}, 512, 1) = 0
epoll_wait(9, {}, 512, 4) = 0
epoll_wait(9, {{EPOLLOUT, {u32=1554179905, u64=47044330972993}}}, 512,
14)
= 1
epoll_wait(9, {{EPOLLOUT, {u32=1554193521, u64=47044330986609}}}, 512,
10)
= 1
writev(137,
[{"\212\375\216\330’\315^\20|\350N\362\25j\272\304=v\227\210?\3539S\343\6D\265C-\360J"…,
336856}], 1) = 96360
epoll_wait(9, {{EPOLLOUT, {u32=1554181193, u64=47044330974281}}}, 512,
9) =
1
writev(79,
[{"\321\277\340\360E\323A\352\352\377\357w\357m_\377\377R\0\200\177\365l\200
\314D\24z\21U\0"…, 228056}], 1) = 128480
epoll_wait(9, {}, 512, 3) = 0
epoll_wait(9, {}, 512, 8) = 0
epoll_wait(9, {}, 512, 2) = 0
epoll_wait(9, {{EPOLLOUT, {u32=1554204009, u64=47044330997097}}}, 512,
26)
= 1
writev(67, [{"\204-&
V\325?\375\33\202B\236\216\r\240\360\17\0103\25\274\3\300>\352\267\211BJ\265\23\327"…,
166588}], 1) = 26280
epoll_wait(9, {}, 512, 12) = 0
epoll_wait(9, {{EPOLLIN, {u32=6779072, u64=6779072}}}, 512, 14) = 1
read(10, “\1\0\0\0\0\0\0\0”, 8) = 8
io_getevents(47044328767488, 1, 64, {{0x13dab860, 0x13dab820, 524288,
0}},
{0, 0}) = 1
writev(80,
[{“X\361N8\2\214\203n\263t\240\\335\241k\212N\366\24\222\32\201u\267\272\32\v\326=\373\34\v”…,
524288}], 1) = 56608
epoll_wait(9, {{EPOLLOUT, {u32=1554187265, u64=47044330980353}}}, 512,
13)
= 1
epoll_wait(9, {{EPOLLOUT, {u32=1554183217, u64=47044330976305}}}, 512,
3) =
1
epoll_wait(9, {}, 512, 1) = 0
epoll_wait(9, {{EPOLLOUT, {u32=1554181744, u64=47044330974832}}}, 512,
13)
= 1
writev(121,
[{"\371s\222d\231\313\17\t\227\31\33a\315\304\365NZ7\323\200\347\337\260\355\253\203\30\215N\313\260d"…,
331027}], 1) = 49640
epoll_wait(9, {}, 512, 2) = 0
epoll_wait(9, {}, 512, 8) = 0
epoll_wait(9, {{EPOLLOUT, {u32=1554192968, u64=47044330986056}}}, 512,
12)
= 1
epoll_wait(9, {{EPOLLIN, {u32=6779072, u64=6779072}}}, 512, 12) = 1
read(10, “\1\0\0\0\0\0\0\0”, 8) = 8
io_getevents(47044328767488, 1, 64, {{0x12599628, 0x125995e8, 524288,
0}},
{0, 0}) = 1
writev(133,
[{"&yj\373dw\335\364\232k\310\6\204\363\365=c{V\257\6:\225\354\233\253b\27*\221\4\264"…,
524288}], 1) = 114748
epoll_wait(9, {{EPOLLOUT, {u32=1554203825, u64=47044330996913}}}, 512,
6) =
1
epoll_wait(9, {{EPOLLOUT, {u32=1554179536, u64=47044330972624}}}, 512,
5) =
1
writev(69,
[{“X\4\250\274\3415\212A\20D\30\2122.\23\351n%\226\245\250\242b$\271\t\22/fX\303\263”…,
256520}], 1) = 130320
epoll_wait(9, {{EPOLLOUT, {u32=1554192784, u64=47044330985872}}}, 512,
27)
= 1
An strace -c taken only when a process is in the ‘D’ state shows:
[root@HOST16 ~]# time strace -p 22334 -c
Process 22334 attached - interrupt to quit
Process 22334 detached
% time seconds usecs/call calls errors syscall
70.65 0.008273 109 76 io_submit
29.35 0.003437 10 360 writev
0.00 0.000000 0 26 read
0.00 0.000000 0 3 open
0.00 0.000000 0 8 close
0.00 0.000000 0 3 fstat
0.00 0.000000 0 1 ioctl
0.00 0.000000 0 1 socket
0.00 0.000000 0 1 1 connect
0.00 0.000000 0 11 3 recvfrom
0.00 0.000000 0 1 getsockname
0.00 0.000000 0 3 getsockopt
0.00 0.000000 0 44 fcntl
0.00 0.000000 0 26 io_getevents
0.00 0.000000 0 180 epoll_wait
0.00 0.000000 0 4 epoll_ctl
100.00 0.011710 748 4 total
real 0m8.570s
user 0m0.016s
sys 0m0.028s
When it’s out of sleeping state, about 25% of the time is spent in
epoll_wait. I’ve also noticed that vmstat shows far less frequent
swapping, although now instead of 5-10MB regularly, it’ll swap ~100MB
every
30+ seconds.
Cheers,
Drew