Hi,
after restarting nginx I find
2012/11/07 10:24:02 [alert] 23635#0: 512 worker_connections are not
enough
2012/11/07 10:24:02 [alert] 23636#0: 512 worker_connections are not
enough
2012/11/07 10:24:04 [alert] 23618#0: cache manager process 23635 exited
with fatal code 2 and cannot be respawned
in my logs. It seems like this error came up after adding more then 2500
virtual hosts, each consisting of two server blocks, one for http, and
one for https.
Now I don't quite understand these messages. In my nginx.conf I have
user www-data;
worker_processes 16;
pid /var/run/nginx.pid;
worker_rlimit_nofile 65000;
events {
worker_connections 2000;
use epoll;
# multi_accept on;
}
so that should be enough worker_connections. Why am I still getting this
message?
For the other message regarding the cache manger, I found this
http://www.ruby-forum.com/topic/519162
thread, where Maxim Dounin suggests that it results from the kernel not
supporting eventfd(). But as far as I understand this is only an issue
with kernels bevore 2.6.18. I use 2.6.32 and my kernel config clearly
states
CONFIG_EVENTFD=y
Here is the nginx version and configure options:
root@debian:~# nginx -V
nginx version: nginx/1.2.4
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx/ --sbin-path=/usr/sbin/nginx
--conf-path=/etc/nginx/nginx.conf
--error-log-path=/var/log/nginx/error.log
--http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid
--lock-path=/var/run/nginx.lock
--http-client-body-temp-path=/var/cache/nginx/client_temp
--http-proxy-temp-path=/var/cache/nginx/proxy_temp
--http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp
--http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp
--http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx
--group=nginx --with-http_ssl_module --with-http_realip_module
--with-http_addition_module --with-http_sub_module
--with-http_dav_module --with-http_flv_module --with-http_mp4_module
--with-http_gzip_static_module --with-http_random_index_module
--with-http_secure_link_module --with-http_stub_status_module
--with-mail --with-mail_ssl_module --with-file-aio --with-ipv6
Any ideas?
Isaac
on 2012-11-07 10:50
on 2012-11-08 14:54
So I also tried Version 1.2.1 form debian backports, which produced the same error. I tried on opensuse 12.2, which worked fine: nginx version: nginx/1.0.15 built by gcc 4.7.1 20120713 [gcc-4_7-branch revision 189457] (SUSE Linux) TLS SNI support enabled configure arguments: --prefix=/usr/ --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/lib/nginx/tmp/ --http-proxy-temp-path=/var/lib/nginx/proxy/ --http-fastcgi-temp-path=/var/lib/nginx/fastcgi/ --http-uwsgi-temp-path=/var/lib/nginx/uwsgi/ --http-scgi-temp-path=/var/lib/nginx/scgi/ --user=nginx --group=nginx --with-rtsig_module --with-select_module --with-poll_module --with-ipv6 --with-file-aio --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module --with-http_image_filter_module --with-http_geoip_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_stub_status_module --with-http_perl_module --with-perl=/usr/bin/perl --with-mail --with-mail_ssl_module --with-pcre --with-libatomic --add-module=passenger/ext/nginx --with-md5=/usr --with-sha1=/usr --with-cc-opt='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fstack-protector' So could it be that this is an issue with the 1.2 Series? Isaac
on 2012-11-08 15:10
On Nov 7, 2012, at 13:49 , Isaac Hailperin wrote: > Now I don't quite understand these messages. In my nginx.conf I have > > so that should be enough worker_connections. Why am I still getting this message? > > For the other message regarding the cache manger, I found this > http://www.ruby-forum.com/topic/519162 > thread, where Maxim Dounin suggests that it results from the kernel not supporting eventfd(). But as far as I understand this is only an issue with kernels bevore 2.6.18. I use 2.6.32 and my kernel config clearly states > CONFIG_EVENTFD=y These message have no relation to eventfd(). A process with pid of 23636 is probably cache loader. Both cache manager and loader do not use configured worker_connection number since they do not process connections at all. However, they need one connection slot to communicate with master process. 512 connections may be taken by listen directives if they use different addreses, or by resolvers if you defined a resolver in every virtual host. A quick workaround is to define just a single resovler at http level. -- Igor Sysoev http://nginx.com/support.html
on 2012-11-08 16:13
> > These message have no relation to eventfd(). > > A process with pid of 23636 is probably cache loader. Both cache manager and loader > do not use configured worker_connection number since they do not process connections > at all. However, they need one connection slot to communicate with master process. > > 512 connections may be taken by listen directives if they use different addreses, > or by resolvers if you defined a resolver in every virtual host. > A quick workaround is to define just a single resovler at http level. Hm, there were no resolvers defined in the virtual hosts. But I tried to add resolver 127.0.0.1; to my https section, but that did not help. Also, if resolvers would be the problem, it should also happen with other nginx builds, like the one I tested on opensuse, see my reply earlier today. Here is my config, including one vhost: user www-data; worker_processes 16; pid /var/run/nginx.pid; worker_rlimit_nofile 65000; events { use epoll; worker_connections 2000; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log debug; #error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; gzip_disable "msie6"; # gzip_vary on; # gzip_proxied any; # gzip_comp_level 6; # gzip_buffers 16 8k; # gzip_http_version 1.1; # gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript; # Because we have a lot of server_names, we need to increase # server_names_hash_bucket_size # (http://nginx.org/en/docs/http/server_names.html) server_names_hash_max_size 32000; server_names_hash_bucket_size 1024; # raise default values for php client_max_body_size 20M; client_body_buffer_size 128k; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /var/www3/acme_cache/load_balancer/upstream.conf; include /etc/nginx/sites-enabled/*; index index.html index.htm ; ## # Proxy Settings ## # include hostname in request to backend proxy_set_header Host $host; # only honor internal Caching policies proxy_ignore_headers X-Accel-Expires Expires Cache-Control; # hopefully fixes an issue with cache manager dying resolver 127.0.0.1; } Then in /etc/nginx/sites-enabled/ there is eg server { server_name www.acme.eu acmeblabla.eu; listen 45100; ssl on; ssl_certificate /etc/nginx/ssl/acme_eu.crt; ssl_certificate_key /etc/nginx/ssl/acme_eu.key; access_log /var/log/www/m77/acmesystems_de/log/access.log; error_log /var/log/nginx/vhost_error.log; proxy_cache acme-cache; proxy_cache_key "$scheme$host$proxy_host$uri$is_args$args"; proxy_cache_valid 200 302 60m; proxy_cache_valid 404 10m; location ~* \.(jpg|gif|png|css|js) { try_files $uri @proxy; } location @proxy { proxy_pass https://backend-www.acme.eu_p45100; } location / { proxy_pass https://backend-www.acme.eu_p45100; } } upstream backend-www.acme.eu_p45100 { server 10.1.1.25:45100; server 10.1.1.26:45100; server 10.1.1.27:45100; server 10.1.1.28:45100; server 10.1.1.15:45100; server 10.1.1.18:45100; server 10.1.1.20:45100; server 10.1.1.36:45100; server 10.1.1.39:45100; server 10.1.1.40:45100; server 10.1.1.42:45100; server 10.1.1.21:45100; server 10.1.1.22:45100; server 10.1.1.23:45100; server 10.1.1.29:45100; server 10.1.1.50:45100; server 10.1.1.43:45100; server 10.1.1.45:45100; server 10.1.1.46:45100; server 10.1.1.19:45100; server 10.1.1.10:45100; } Isaac
on 2012-11-08 16:54
> So could it be that this is an issue with the 1.2 Series?
Ok, this is not the case: I tried 1.0.15 build by hand on debian, and
have the same issue.
Isaac
on 2012-11-09 14:15
Refining my observations: Its not an issue of version or OS ... that were wrong obersvations on my side. But: Of the approx. 5000 vhost, there are about 1000 who do ssl, each on a different (high) port. So without the ssl vhosts, I have about 1000 open files for nginx (lsof |grep nginx|wc) And nginx runs fine. With the ssl vhosts, I have about 17000 open files. And I get the errors. Does that ring a bell somewhere? Also, 17000 is about 16 (amount of worker processes) * 1000 (num ssl hosts) + 1000 (nofiles without ssl). I also wonder where the 512 worker_connections from the error message come from. There is no such number in my config. Is it hardcoded somewhere? Isaac
on 2012-11-09 17:27
On 11/9/12 5:15 PM, Isaac Hailperin wrote: [...] > I also wonder where the 512 worker_connections from the error > message come from. There is no such number in my config. Is it > hardcoded somewhere? > http://nginx.org/en/docs/ngx_core_module.html#work... It's a default number of worker_connections. -- Maxim Konovalov +7 (910) 4293178 http://nginx.com/support.html
on 2012-11-09 19:33
On 11/09/2012 05:27 PM, Maxim Konovalov wrote: > On 11/9/12 5:15 PM, Isaac Hailperin wrote: > [...] >> I also wonder where the 512 worker_connections from the error >> message come from. There is no such number in my config. Is it >> hardcoded somewhere? >> > http://nginx.org/en/docs/ngx_core_module.html#work... > > It's a default number of worker_connections. Yes, but if I specify a differen number, like http://www.ruby-forum.com/topic/4407591#1083581 this should be different. Now this could lead to the conclusion, that nginx is not reading that file, but nginx -t clearly says so. Also, if I introduce syntactic errors in that file, nginx complains. As Igor Sysoev suggested earlier http://www.ruby-forum.com/topic/4407591#1083572 the worker_connection parameter might not be related, since also cache manager and loader use connections. If these are hard coded to a max of 512, this might be the cause: there are exactly 1002 vhosts which each listen on a different port. Now its not 1024, which would be 512*2, but may be there is some overhead which makes me come to this limit? If my thinking is correct (?), is there a way to overcome this limit? (other then using just one port for ssl ... it would mean using different ip addresses, which would have the same effect, I guess?) Any thoughts on this are welcome. Isaac
on 2012-11-09 19:51
What does 'cat /proc/sys/fs/file-rn' say? -- Maxim Konovalov +7 (910) 4293178 http://nginx.com/support.html
on 2012-11-09 19:56
On 11/09/2012 07:52 PM, Maxim Konovalov wrote: > What does 'cat /proc/sys/fs/file-rn' say? > cat /proc/sys/fs/file-nr 1696 0 205028
on 2012-11-09 20:37
Am 09.11.2012 19:33, schrieb Isaac Hailperin: I did several hours of testing today with Isaac and there are two problems. PROBLEM/BUG ONE: First of all: The customer has 1.000 SSL-hosts on the nginx-Server, so he wants to have 1000 listeners on TCP-Ports. But the cache_manager isn't able to open so many listeners. He's crashing after 512 open listeners. It looks very much like the cache_manager doesn't read the worker_connections setting from nginx.conf. We configured: worker_connections 10000; there, but the cache_manager crashes with 2012/11/09 17:53:11 [alert] 9345#0: 512 worker_connections are not enough 2012/11/09 17:53:12 [alert] 9330#0: cache manager process 9344 exited with fatal code 2 and cannot be respawned I did some testing: Having 505 SSL-hosts on the Server (=505 listener sockets) everything's working fine, but 515 listener sockets aren't possible. It's easy to reproduce: Just define 515 ssl-domains having different TCP-ports for every domain. :-) Looks like nobody had the idea before, that "somebody" (TM) could run more then 2 times /24-network-IPs on one single host. In fact, this does not happen in normal life... But for historical reasons (TM) our customer uses ONE ip-address and several TCP-Ports for that so he doesn't have a problem running so many differend SSL-hosts on one system -- and this is the special situation where we can see the bug (?), that the cache_manager ignores the worker_connection-setting (?), when he tries to open all the listeners and relating cache-files/sockets. So: Looks like a bug? Who can help? We need help... PROBLEM/BUG TWO: Having 16 workers for 1000 ssl-domains with 1000 listeners, we can see 16 * 1000 open TCP-listeners on that system, because every worker open it's own listeners (?). When we reach the magical barrier of 16386 open listeners (lsof -i | grep -c nginx), nginx is running into some kind of file limitations: 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning "worker process" (24: Too many open files) 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning "cache manager process" (24: Too many open files) 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning "cache loader process" (24: Too many open files) It's very easy to see, that the limitation is based on 16.386 open files and sockets from nginx. But I can't find the place, where this limitation comes from. "ulimit -n" is set to 100.000, everything's looking fine and should work with many more open files then just 16K. Could it be, that "nobody" (TM) expected, that "somebody" (TM) runs more then 1000 ssl-hosts with different TCP-ports on 16 worker-instances and that there's some kind of SMALL-INT-problem in the nginx code? Could it be, that this isn't a limitation from the linux system, but from some kind of too small address-space for that in nginx? So: Looks like a bug? Who can help? We need help... Peer -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-42 Fax: 030 / 405051-19 Zwangsangaben lt. 35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschftsfhrer: Peer Heinlein -- Sitz: Berlin
on 2012-11-09 21:07
Hi, On Nov 9, 2012, at 23:36, Peer Heinlein <p.heinlein@heinlein-support.de> wrote: > isn't able to open so many listeners. He's crashing after 512 open > 2012/11/09 17:53:12 [alert] 9330#0: cache manager process 9344 exited > Looks like nobody had the idea before, that "somebody" (TM) could run > So: Looks like a bug? Who can help? We need help... > 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning > -n" is set to 100.000, everything's looking fine and should work with > > > -- > Heinlein Support GmbH Are you looking for a commercial support option to back up your customer's contract with an underpinning contract and vendor support? I that's the case we've got our support options described here: http://nginx.com/support.html Hope this helps
on 2012-11-09 21:16
Am 09.11.2012 21:06, schrieb Andrew Alexeev: > Are you looking for a commercial support option to back up your customer's contract with an underpinning contract and vendor support? First of all I'm reporting some severe bugs in nginx. nginx should be interested in that and we *really* spent a lof of time for debugging and analyzing this (and, this many time has NOT been paid). And: I've already been on the commercial support page but there was no "by call"-support. I'm not interested in 12-month-contracts to solve one single problem. I do ** NOT ** have a problem paying somebody to fix that. I would have been happy the last few days having somebody else familiar with nginx debugging and fixing that. Unfortunetely there was no "by call"-Support (or I haven't found that). Feel free to send me offlist an offer about fixing this bug ASAP. I'd appreciate this! Peer -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-42 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin
on 2012-11-09 21:38
On Nov 10, 2012, at 0:15, Peer Heinlein <p.heinlein@heinlein-support.de> wrote: > Am 09.11.2012 21:06, schrieb Andrew Alexeev: > > >> Are you looking for a commercial support option to back up your customer's contract with an underpinning contract and vendor support? > > First of all I'm reporting some severe bugs in nginx. nginx should be > interested in that and we *really* spent a lof of time for debugging and > analyzing this (and, this many time has NOT been paid). Thanks much. What about also filling out a bug report in trac please? We'd definitely look more into that one and fix it during our normal dev cycle for 1.3.x. > And: > > I've already been on the commercial support page but there was no "by > call"-support. I'm not interested in 12-month-contracts to solve one > single problem. Got it. > I do ** NOT ** have a problem paying somebody to fix that. I would have > been happy the last few days having somebody else familiar with nginx > debugging and fixing that. > > Unfortunetely there was no "by call"-Support (or I haven't found that). I'm glad you like what you're doing for a living. Appreciate your efforts debugging nginx too. We fix a lot of things and often - check the changelogs. We don't have enough resources to fix everything ASAP though. If you've got certain commercial commitments, so do we. There are different options on http://nginx.com/support.html including an option to do custom inquiry.
on 2012-11-21 08:25
On 11/9/12 10:33 PM, Isaac Hailperin wrote: >> > the worker_connection parameter might not be related, since also > Any thoughts on this are welcome. > > Isaac > Just for the record -- the issue should be fixed by r4918: http://trac.nginx.org/nginx/changeset/4918/nginx -- Maxim Konovalov +7 (910) 4293178 http://nginx.com/support.html
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.