TIME_WAITs and sustained connections was: Re: Memcached module -- unix domain socket support? (too m

On Thu, Aug 7, 2008 at 11:03 AM, Chavelle V. [email protected]
wrote:

definitly not an expert but when I read your post, it’s the idea that I
have had.

From what I read ncache works by caching items from disk into memory.
The problem I have is that the media is changing every 250ms.

Anyway an update, I have tweaked the kernel IPV4 parameters and even
though I have at times 23,000 TIME_WAITs, they are now being recycled.
The server is sustaining about 50Mbits of traffic going through
memcached->nginx. My connection count hovers around 220 when using
nginx_stats module. Is this the norm? I am wondering if I have an
artificial bottleneck somewhere?

Looking more into the unix domain sockets I am not sure if this would
be the most pertinent solution to the problem. I noticed there was
some talk on persistent connections to upstream servers – is this
planned for memcached (I am using this model on the app server side to
feed memcached and it is working great)?

Cheers
Kon

what is the operating system?
what do you mean tweaked(details pls.)?

regards,
lix

what is the operating system?
what do you mean tweaked(details pls.)?

I am seeing the same behavior on Centos 4.5 with nginx 0.6.32 and
memcached 1.2.6.

After running load tests using apache bench, I see a high number
of TIME_WAIT connections from netstat. The load tests hits nginx
which fetches static response from memcached.

$ uname -a
Linux geocode1.admin.zvents.com 2.6.9-55.ELsmp #1 SMP Wed May 2 14:04:42
EDT
2007 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/redhat-release
CentOS release 4.5 (Final)

$ netstat -an |grep -c TIME_WAIT
6

$ ab -c 4 -n 20000 ‘http://localhost/geocode?address=94131

$ netstat -an |grep -c TIME_WAIT
34444

If I run the test again, I see a number of errors in error.log that look
like this:

2008/01/31 20:49:30 [crit] 24806#0: *70538 connect() to 127.0.0.1:11211
failed (99: Cannot assign requested address) while connecting to
upstream, client: 192.168.200.10, server: www.xxxxxxxxxx.com, request:
“GET /geocode?address=94131 HTTP/1.0”, upstream:
“memcached://127.0.0.1:11211”,
host: “www.xxxxxxxxxx.com

I believe this is a consequence of having too many sockets in TIME_WAIT.

Finally, I think the original poster was referring to the following
kernel
parameter.

net.ipv4.ip_local_port_range = 1024 65000

On Sun, Aug 10, 2008 at 12:09 PM, Tyler K. [email protected]
wrote:

what is the operating system?
what do you mean tweaked(details pls.)?

net.ipv4.ip_local_port_range = 1024 65000

Indeed. Also:
net.ipv4.ip_local_port_range = 10000 65535 (since I have memcached and
other services on port 9000-10000)
net.ipv4.tcp_tw_recycle = 1 (fast recycle of T_W’s)
net.ipv4.tcp_tw_reuse = 1 (reuse T_W’s when ‘safe’ to do so)

I would advise anyone to read http://www.ietf.org/rfc/rfc1337.txt
before committing these to production.

It would seem that a persistent backend connection or domain socket
support for memcached would resolve this issue.

In my tests I have also noticed that connecting and disconnecting from
memcached has a performance impact on serving content. When I use this
mechanism on my application backend my video serving FPS maxes out at
2-3 updates per second. With a persistent connection I can go upwards
of 10fps. It would seem likely that the non-persistent
nginx<->memcached connectivity would also cause a bottleneck under
high connection rates.

Cheers
Kon