Nginx reload fails with host not found in upstream

hello –

nginx will not reload on some of our proxy servers, but does on others.
all
are running the same version: nginx/1.0.15. the reload fails with error:

[emerg] 26903#0: host not found in upstream “webappNNx:8080” in
/etc/nginx/upstream.conf:N

the issue appears to be related to nginx’s ability to resolve a
hostname.
our proxy servers use BIND servers that we run ourselves. the BIND
servers
are returning answers just fine afaict. and when i reproduce this
problem on
a proxy server, i sniff the network and can confirm the proxy is asking
the
nameserver for an A record, and gets that answer back successfully.

there is a workaround i found, but i would really really rather not
resort
to this: putting backend (aka upstream :<) app nodes’ into /etc/hosts. i
have also heard suggestions to put the backend nodes’ IPs into the proxy
pool file (upstream.conf), but again, i’d rather not because it’s not
human
readable, especially when firefighting. i’m hoping there is a better
solution out there than these workarounds.

we are using a thirdparty module:
GitHub - yaoweibin/nginx_upstream_check_module: Health checks upstreams for nginx. no i have not
tried to reproduce this problem without the module. i don’t know how i
would
since we need the functionality that it provides. and yes i will follow
up
with the module author.

any help? thank you very much in advance. all the gory details follow.

kallen

straces available upon request :>

a proxy server where the problem does occur:

i’d like to note that the nginx parent on this server has been running
for
about 6 months.

i try to reload, but the reload will not complete due to the error

[emerg] 26903#0: host not found in upstream “webapp04a:8080” in
/etc/nginx/upstream.conf:3

12/07 01:28[root@proxy2-prod-ue1 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

12/07 01:28[root@proxy2-prod-ue1 ~]# ps wwwwaxuf | grep ngin[x]
root 20569 0.0 0.2 25652 5364 ? Ss Jun20 0:03 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 3401 0.4 0.8 37056 15960 ? S Dec05 8:39 _
nginx:
worker process
nginx 3402 0.4 1.1 40916 19836 ? S Dec05 8:36 _
nginx:
worker process

12/07 01:29[root@proxy2-prod-ue1 ~]# cat /etc/nginx/upstream.conf

Tomcat via HTTP

upstream tomcats_http {
server webapp02c:8080 max_fails=2;
server webapp06c:8080 max_fails=2;
server roapp02c:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http
default_down=false;
check_http_send “GET /healthcheck/version HTTP/1.0\r\n\r\n”;
}

12/07 01:29[root@proxy2-prod-ue1 ~]# tcpdump -nvv -i eth0 -s0 -X port 53
and
host 10.24.27.66

12/07 01:30[root@proxy2-prod-ue1 ~]# strace -f -s 2048 -ttt -T -p 20569
-o
nginx-parent-strace
Process 20569 attached - interrupt to quit

12/07 01:27[root@proxy2-prod-ue1 ~]# /etc/init.d/nginx reload; tail -f
/var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 00:05:29 [debug] 12290#0: bind() 0.0.0.0:80 #6
2012/12/07 00:05:29 [debug] 12290#0: bind() 0.0.0.0:443 #7
2012/12/07 00:05:29 [debug] 12290#0: counter: B7F38080, 1
2012/12/07 01:28:37 [debug] 22928#0: bind() 0.0.0.0:80 #6
2012/12/07 01:28:37 [debug] 22928#0: bind() 0.0.0.0:443 #7
2012/12/07 01:28:37 [debug] 22928#0: counter: B7F8F080, 1
2012/12/07 01:31:44 [debug] 23383#0: bind() 0.0.0.0:80 #6
2012/12/07 01:31:44 [debug] 23383#0: bind() 0.0.0.0:443 #7
2012/12/07 01:31:44 [debug] 23383#0: counter: B7F56080, 1
2012/12/07 01:31:44 [emerg] 20569#0: host not found in upstream
“webapp02c:8080” in /etc/nginx/upstream.conf:3

as soon as that reload fires, i do see nameservice traffic on the wire.
so
it is NOT a matter of DNS service being unavailable. i note that it does
ask
for the A record twice. i don’t know why.

01:31:44.426376 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF],
proto:
UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp
cksum
799c!] 18875+ A? webapp02c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52
E…Hx.@.@…N…+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 49bb 0100
…U.3.5.4…I…
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030
…webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503
2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com…
01:31:44.427301 IP (tos 0x0, ttl 63, id 42228, offset 0, flags [none],
proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723:
[udp
sum ok] 18875* q: A? webapp02c.prod.romeovoid.com. 1/2/2
webapp02c.prod.romeovoid.com. A 10.51.23.17 ns: prod.romeovoid.com. NS
ns1.prod.romeovoid.com., prod.romeovoid.com. NS ns2.prod.romeovoid.com.
ar:
ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A
10.24.27.66 (128)
0x0000: 4500 009c a4f4 0000 3f11 a7cc 0af4 ed55
E…?..U
0x0010: 0af5 2b52 0035 ed33 0088 e8c5 49bb 8580
…+R.5.3…I…
0x0020: 0001 0001 0002 0002 0977 6562 6170 7030
…webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503
2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 c00c 0001 0001 0000
com…
0x0050: 003c 0004 0a73 2aab c016 0002 0001 0001
.<…s*…
0x0060: 5180 0006 036e 7331 c016 c016 0002 0001
Q…ns1…
0x0070: 0001 5180 0006 036e 7332 c016 c048 0001
…Q…ns2…H…
0x0080: 0001 0000 003c 0004 0ac0 530e c05a 0001
…<…S…Z…
0x0090: 0001 0000 003c 0004 0af4 ed55 …<…U
01:31:44.427420 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF],
proto:
UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp
cksum
8c21!] 50344+ A? webapp02c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52
E…Hx.@.@…N…+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 c4a8 0100
…U.3.5.4…
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030
…webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503
2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com…
01:31:44.428050 IP (tos 0x0, ttl 63, id 42229, offset 0, flags [none],
proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723:
[udp
sum ok] 50344* q: A? webapp02c.prod.romeovoid.com. 1/2/2
webapp02c.prod.romeovoid.com. A 10.51.23.17 ns: prod.romeovoid.com. NS
ns2.prod.romeovoid.com., prod.romeovoid.com. NS ns1.prod.romeovoid.com.
ar:
ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A
10.24.27.66 (128)
0x0000: 4500 009c a4f5 0000 3f11 a7cb 0af4 ed55
E…?..U
0x0010: 0af5 2b52 0035 ed33 0088 6dd8 c4a8 8580
…+R.5.3…m…
0x0020: 0001 0001 0002 0002 0977 6562 6170 7030
…webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503
2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 c00c 0001 0001 0000
com…
0x0050: 003c 0004 0a73 2aab c016 0002 0001 0001
.<…s*…
0x0060: 5180 0006 036e 7332 c016 c016 0002 0001
Q…ns2…
0x0070: 0001 5180 0006 036e 7331 c016 c05a 0001
…Q…ns1…Z…
0x0080: 0001 0000 003c 0004 0ac0 530e c048 0001
…<…S…H…
0x0090: 0001 0000 003c 0004 0af4 ed55 …<…U
01:31:44.428142 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF],
proto:
UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp
cksum
1632!] 45086+ A? webapp06c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52
E…Hx.@.@…N…+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 b01e 0100
…U.3.5.4…
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030
…webapp0
0x0030: 3663 0470 726f 6407 7361 6173 7572 6503
6c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com…
01:31:44.428791 IP (tos 0x0, ttl 63, id 42230, offset 0, flags [none],
proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723:
[udp
sum ok] 45086* q: A? webapp06c.prod.romeovoid.com. 1/2/2
webapp06c.prod.romeovoid.com. A 10.195.76.80 ns: prod.romeovoid.com. NS
ns1.prod.romeovoid.com., prod.romeovoid.com. NS ns2.prod.romeovoid.com.
ar:
ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A
10.24.27.66 (128)
[snip]

the workaround, put all backend nodes (in upstream.conf) into /etc/hosts
:<

12/07 01:34[root@proxy2-prod-ue1 ~]# tail -3 /etc/hosts
10.51.23.17 webapp02c.prod.romeovoid.com webapp02c
10.195.76.80 webapp06c.prod.romeovoid.com webapp06c
10.96.23.87 roapp02c.prod.romeovoid.com roapp02c

and now, it will reload just fine:

12/07 01:34[root@proxy2-prod-ue1 ~]# /etc/init.d/nginx reload; tail -f
/var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 01:35:39 [debug] 24076#0: bind() 0.0.0.0:80 #6
2012/12/07 01:35:39 [debug] 24076#0: bind() 0.0.0.0:443 #7
2012/12/07 01:35:39 [debug] 24076#0: counter: B7FCD080, 1
2012/12/07 01:35:39 [debug] 20569#0: http upstream check, find
oshm_zone:092C6390, opeers_shm: B7451000
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit
opeer:10.51.23.17:8080
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit
opeer:10.195.76.80:8080
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit
opeer:10.96.23.87:8080
2012/12/07 01:35:39 [notice] 20569#0: using the “epoll” event method
2012/12/07 01:35:39 [notice] 20569#0: start worker processes
2012/12/07 01:35:39 [debug] 20569#0: channel 3:5
2012/12/07 01:35:39 [notice] 20569#0: start worker process 24078
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:2 pid:24078 fd:3 to
s:0
pid:3401 fd:9
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:2 pid:24078 fd:3 to
s:1
pid:3402 fd:11
2012/12/07 01:35:39 [debug] 20569#0: channel 14:15
2012/12/07 01:35:39 [notice] 20569#0: start worker process 24079
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to
s:0
pid:3401 fd:9
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to
s:1
pid:3402 fd:11
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to
s:2
pid:24078 fd:3
2012/12/07 01:35:39 [debug] 20569#0: child: 0 3401 e:0 t:0 d:0 r:1 j:0
2012/12/07 01:35:39 [debug] 20569#0: child: 1 3402 e:0 t:0 d:0 r:1 j:0
2012/12/07 01:35:39 [debug] 20569#0: child: 2 24078 e:0 t:0 d:0 r:1 j:1
2012/12/07 01:35:39 [debug] 20569#0: child: 3 24079 e:0 t:0 d:0 r:1 j:1
2012/12/07 01:35:39 [debug] 20569#0: sigsuspend
2012/12/07 01:35:39 [debug] 24078#0: malloc: 09340600:6144
2012/12/07 01:35:39 [debug] 24079#0: malloc: 09340600:6144
2012/12/07 01:35:39 [debug] 24078#0: malloc: 0931D3E0:102400

a proxy server where the problem does NOT occur:

i’d like to note that the nginx parent on this server has been running
for
only about 1 month.

12/07 01:04[root@proxy5-prod-ue1 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

12/07 01:40[root@proxy5-prod-ue1 ~]# cat /etc/nginx/upstream.conf

Tomcat via HTTP

upstream tomcats_http {
server webapp09e:8080 max_fails=2;
server webapp10e:8080 max_fails=2;
server roapp05e:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http
default_down=false;
check_http_send “GET /healthcheck/version HTTP/1.0\r\n\r\n”;
}
12/07 01:40[root@proxy5-prod-ue1 ~]# grep webapp /etc/hosts
12/07 01:41[root@proxy5-prod-ue1 ~]# # nothing as expected

12/07 01:42[root@proxy5-prod-ue1 ~]# ps wwwwaxuf | grep ngin[x]
root 4817 0.0 0.3 106184 5528 ? Ss Nov07 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 8396 0.6 0.8 116692 15488 ? S 00:36 0:25 _
nginx:
worker process
nginx 8397 0.6 0.8 116296 15096 ? S 00:36 0:25 _
nginx:
worker process

12/07 01:42[root@userproxy5-prod-ue1 ~]# /etc/init.d/nginx reload; tail
-f
/var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 01:42:44 [debug] 8396#0: posted event 0000000000000000
2012/12/07 01:42:44 [debug] 8396#0: worker cycle
2012/12/07 01:42:44 [debug] 8396#0: accept mutex locked
2012/12/07 01:42:44 [debug] 8396#0: epoll timer: 399
2012/12/07 01:42:44 [notice] 4817#0: signal 1 (SIGHUP) received,
reconfiguring
2012/12/07 01:42:44 [debug] 4817#0: wake up, sigio 0
2012/12/07 01:42:44 [notice] 4817#0: reconfiguring
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign:
00000000007F1BA0:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign:
000000000081FB60:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000008C1980:4096
2012/12/07 01:42:44 [debug] 4817#0: read: 6, 00000000008C1980, 4096, 0
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000006E0A80:6912
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007E59C0:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007A0610:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000731E00:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000774AD0:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000873750:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000781760:4280
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign:
00000000008D1170:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007EEA40:4096
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/mime.types
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/mime.types
2012/12/07 01:42:44 [debug] 4817#0: malloc: 000000000080F300:4096
2012/12/07 01:42:44 [debug] 4817#0: read: 8, 000000000080F300, 3463, 0
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000006DCA90:4096
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign:
00000000007642B0:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign:
00000000008B5F40:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign:
000000000075B000:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign:
000000000087E390:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: include upstream.conf
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/upstream.conf

our config

upstream.conf:

Tomcat via HTTP

upstream tomcats_http {
server webapp02c:8080 max_fails=2;
server webapp06c:8080 max_fails=2;
server roapp02c:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http
default_down=false;
check_http_send “GET /healthcheck/version HTTP/1.0\r\n\r\n”;
}

nginx.conf:

user nginx;
worker_processes 2;
syslog local2 nginx;
error_log syslog:warn|/var/log/nginx/error.log;
pid /var/run/nginx.pid;
worker_rlimit_core 500M;
working_directory /var/coredumps/;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
proxy_buffers 8 16k;
proxy_buffer_size 32k;
log_format main '$remote_addr - $remote_user [$time_local]
“$request” ’
'$status $body_bytes_sent “$http_referer” ’
‘“$http_user_agent” “$http_x_forwarded_for”’;
access_log syslog:warn|/var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
gzip on;
server {
listen 80;
server_name _;
# put X-Purpose: preview into the trash. thank you Safari
if ($http_x_purpose ~* “preview”) {
return 444;
break;
}
# Module ngx_http_stub_status_module
location /nginx-status {
stub_status on;
access_log off;
allow 10.0.0.0/8;
allow 127.0.0.1;
deny all;
}
location /upstream-status {
check_status;
access_log off;
allow 10.0.0.0/8;
allow 127.0.0.1;
deny all;
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/error;
}
error_page 403 /403.html;
location = /403.html {
root /usr/share/nginx/error;
}
error_page 500 502 504 /500.html;
location = /500.html {
root /usr/share/nginx/error;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/error;
}
set $global_ssl_redirect ‘yes’;
if ($request_filename ~ “nginx-status”) {
set $global_ssl_redirect ‘no’;
}
if ($request_filename ~ “upstream-status”) {
set $global_ssl_redirect ‘no’;
}
if ($global_ssl_redirect ~* ‘^yes$’) {
rewrite ^ https://$host$request_uri? permanent;
break;
}
}

Keep upstream defs in a separate file for easier pool membership

control
include upstream.conf;
server {
listen 443;
server_name _;
# put X-Purpose: preview into the trash. thank you Safari
if ($http_x_purpose ~* “preview”) {
return 444;
break;
}
ssl on;
ssl_certificate certs/wildcard_void_com.crt;
ssl_certificate_key certs/wildcard_void_com.key;
ssl_protocols SSLv3 TLSv1;
ssl_ciphers HIGH:!ADH:!MD5;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
set_real_ip_from 10.0.0.0/8;
real_ip_header X-Forwarded-For;
add_header Cache-Control public;
## Tomcat via HTTP
location / {
proxy_pass http://tomcats_http;
proxy_connect_timeout 10s;
proxy_next_upstream error invalid_header http_503 http_502
http_504;
proxy_set_header Host $host;
proxy_set_header X-Server-Port $server_port;
proxy_set_header X-Server-Protocol https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Strict-Transport-Security max-age=315360000;
proxy_set_header X-Secure true;
proxy_set_header Transfer-Encoding “”; # OPS-475 remove if/when we
update/punt Tomcat
if ($request_uri ~* “.(ico|css|js|gif|jpe?g|png)”) {
expires 365d;
break;
}
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/error;
}
error_page 403 /403.html;
location = /403.html {
root /usr/share/nginx/error;
}
error_page 500 502 504 /500.html;
location = /500.html {
root /usr/share/nginx/error;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/error;
}
}
}

Posted at Nginx Forum:

I think your BIND server is suspicious. In Nginx, it’s just do the
call gethostbyname() when reloading. It’s normal call in the glibc.

Can you write a simple C code to use the gethostbyname() call for
confirmation?

2012/12/7 groknaut [email protected]

12/07 05:31[root@proxy2-prod-ue1 ~]# cat gethostbyname.c
#include <stdio.h>
#include <netdb.h>

int main(int argc, char *argv[])
{
if (argc != 2) {
printf(“usage: %s [hostname]\n”, argv[0]);
return 1;
}

struct hostent *lh = gethostbyname(argv[1]);

if (lh) {
  puts(lh->h_name);
  return 0;
} else {
  herror("gethostbyname");
  return 1;
}

}
12/07 05:31[root@proxy2-prod-ue1 ~]# gcc -o gethostbyname
gethostbyname.c
12/07 05:31[root@proxy2-prod-ue1 ~]# ./gethostbyname webapp02c
webapp02c.prod.romeovoid.com
12/07 05:31[root@proxy2-prod-ue1 ~]# ./gethostbyname
webapp02c.prod.romeovoid.com
webapp02c.prod.romeovoid.com

Posted at Nginx Forum:

maybe this one’s better:

12/07 06:19[root@proxy2-prod-ue1 ~]# cat gethostbyname.c
#include <stdio.h>
#include <netdb.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>

int main(int argc, char *argv[])
{
if (argc < 2) {
fprintf(stderr, “usage: %s hostname\n”, argv[0]);
return 1;
}

// skip 0 because that is the program name
for (int i = 1; i < argc; ++i) {

    struct hostent *lh = gethostbyname( argv[i] );

    if (lh) {
            struct in_addr **addr_list;
            addr_list = (struct in_addr **) lh->h_addr_list;

        printf("%-14s %s\n",
            inet_ntoa( *addr_list[0] ),
            lh->h_name
        );
    }
    else {
        herror("gethostbyname");
    }
}

return 0;

}
12/07 06:20[root@proxy2-prod-ue1 ~]# gcc -std=c99 gethostbyname.c -o
gethostbyname.bin
12/07 06:20[root@proxy2-prod-ue1 ~]# ./gethostbyname.bin
webapp02c.prod.romeovoid.com
10.51.23.17 webapp02c.prod.romeovoid.com
12/07 06:21[root@proxy2-prod-ue1 ~]# ./gethostbyname.bin
webapp06c.prod.romeovoid.com
10.195.76.80 webapp06c.prod.romeovoid.com

our DNS does work…

Posted at Nginx Forum:

i ran the test, and i see no problems:

12/07 22:07[root@proxy2-prod-ue1 ~]# grep app /etc/hosts
12/07 22:07[root@proxy2-prod-ue1 ~]# time ./gethostbyname.bin webapp02c
webapp02c webapp06c webapp06c roapp02c roapp02c
10.51.23.17 webapp02c.prod.romeovoid.com
10.51.23.17 webapp02c.prod.romeovoid.com
10.195.76.80 webapp06c.prod.romeovoid.com
10.195.76.80 webapp06c.prod.romeovoid.com
10.96.23.87 roapp02c.prod.romeovoid.com
10.96.23.87 roapp02c.prod.romeovoid.com

real 0m0.009s
user 0m0.000s
sys 0m0.000s

Posted at Nginx Forum:

On Fri, Dec 07, 2012 at 01:22:50AM -0500, groknaut wrote:

{
if (lh) {
}
webapp06c.prod.romeovoid.com
10.195.76.80 webapp06c.prod.romeovoid.com

our DNS does work…

Like was already told, nginx internally does gethostbyname()
to resolve hostnames during configuration, so if a problem
disappears by moving the hostnames into /etc/hosts, I’d not
suspect nginx. (It does gethostbyname() twice due to how
it’s currently coded, so it’s expected.)

To emulate what nginx does internally when processing this
upstream, run it like this:

./gethostbyname.bin webapp02c webapp02c webapp06c webapp06c roapp02c
roapp02c

WITHOUT hostnames in /etc/hosts. Do it several times in a row.

If that doesn’t reveal the problem, do you have an ability
to recompile nginx from sources?