TIME_WAITs and sustained connections was: Re: Memcached module -- unix domain socket support? (too m

Kon_W · August 8, 2008, 8:06am

On Thu, Aug 7, 2008 at 11:03 AM, Chavelle V. [email protected]
wrote:

definitly not an expert but when I read your post, it’s the idea that I
have had.

From what I read ncache works by caching items from disk into memory.
The problem I have is that the media is changing every 250ms.

Anyway an update, I have tweaked the kernel IPV4 parameters and even
though I have at times 23,000 TIME_WAITs, they are now being recycled.
The server is sustaining about 50Mbits of traffic going through
memcached->nginx. My connection count hovers around 220 when using
nginx_stats module. Is this the norm? I am wondering if I have an
artificial bottleneck somewhere?

Looking more into the unix domain sockets I am not sure if this would
be the most pertinent solution to the problem. I noticed there was
some talk on persistent connections to upstream servers – is this
planned for memcached (I am using this model on the app server side to
feed memcached and it is working great)?

Cheers
Kon

Kon_W · August 8, 2008, 10:50am

what is the operating system?
what do you mean tweaked(details pls.)?

regards,
lix

Kon_W · August 10, 2008, 9:16pm

what is the operating system?
what do you mean tweaked(details pls.)?

I am seeing the same behavior on Centos 4.5 with nginx 0.6.32 and
memcached 1.2.6.

After running load tests using apache bench, I see a high number
of TIME_WAIT connections from netstat. The load tests hits nginx
which fetches static response from memcached.

$ uname -a
Linux geocode1.admin.zvents.com 2.6.9-55.ELsmp #1 SMP Wed May 2 14:04:42
EDT
2007 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/redhat-release
CentOS release 4.5 (Final)

$ netstat -an |grep -c TIME_WAIT
6

$ ab -c 4 -n 20000 ‘http://localhost/geocode?address=94131’

$ netstat -an |grep -c TIME_WAIT
34444

If I run the test again, I see a number of errors in error.log that look
like this:

2008/01/31 20:49:30 [crit] 24806#0: *70538 connect() to 127.0.0.1:11211
failed (99: Cannot assign requested address) while connecting to
upstream, client: 192.168.200.10, server: www.xxxxxxxxxx.com, request:
“GET /geocode?address=94131 HTTP/1.0”, upstream:
“memcached://127.0.0.1:11211”,
host: “www.xxxxxxxxxx.com”

I believe this is a consequence of having too many sockets in TIME_WAIT.

Finally, I think the original poster was referring to the following
kernel
parameter.

net.ipv4.ip_local_port_range = 1024 65000

Kon_W · August 11, 2008, 7:57pm

On Sun, Aug 10, 2008 at 12:09 PM, Tyler K. [email protected]
wrote:

what is the operating system?
what do you mean tweaked(details pls.)?

net.ipv4.ip_local_port_range = 1024 65000

Indeed. Also:
net.ipv4.ip_local_port_range = 10000 65535 (since I have memcached and
other services on port 9000-10000)
net.ipv4.tcp_tw_recycle = 1 (fast recycle of T_W’s)
net.ipv4.tcp_tw_reuse = 1 (reuse T_W’s when ‘safe’ to do so)

I would advise anyone to read http://www.ietf.org/rfc/rfc1337.txt
before committing these to production.

It would seem that a persistent backend connection or domain socket
support for memcached would resolve this issue.

In my tests I have also noticed that connecting and disconnecting from
memcached has a performance impact on serving content. When I use this
mechanism on my application backend my video serving FPS maxes out at
2-3 updates per second. With a persistent connection I can go upwards
of 10fps. It would seem likely that the non-persistent
nginx<->memcached connectivity would also cause a bottleneck under
high connection rates.

Cheers
Kon