Upstream prematurely closes cnx => 502 Bad Gateway to client

Nginx’ers,

I trying to figure out why I’m randomly are seeing requests having
issues with a nginx 1.7.4 when proxying to an upstream pool like:

2016/03/03 10:24:21 [error] 15905#0: *3252814 upstream prematurely
closed connection while reading response header from upstream, client:
10.45.69.25, server: , request: “POST / HTTP/1.1”,
upstream: “http://10.45.69.28:8081/”, host: “mxosCl:8081”
2016/03/03 10:30:02 [error] 15905#0: *3237632 recv() failed (104:
Connection reset by peer) while reading response header from upstream,
client: 10.45.69.99, server: , request: “POST / HTTP/1.1”,
upstream: “http://10.45.69.25:8081/”, host: “mxosCl:8081”
2016/03/03 10:30:02 [error] 15905#0: *3237670 upstream prematurely
closed connection while reading response header from upstream, client:
10.45.69.99, server: , request: “POST / HTTP/1.1”,
upstream: “http://10.45.69.28:8081/”, host: “mxosCl:8081”
2016/03/03 10:30:03 [error] 15905#0: *3237670 upstream prematurely
closed connection while reading response header from upstream, client:
10.45.69.99, server: , request: “DELETE / HTTP/1.1”,
upstream: “http://10.45.69.26:8081/”, host: “mxosCl:8081”

upstream servers doesn’t log any issues, it seems. clients get a 502 Bad
Gateway for such requests.

TIA

/Steffen

My config btw:

user imail;
worker_processes auto;
daemon on;
master_process on;
error_log logs/mos_error.tcp debug_tcp;
error_log logs/mos_error.log;
pid /opt/imail/nginx/logs/mos_nginx.pid;
worker_rlimit_nofile 200000;
worker_rlimit_core 500M;
working_directory /opt/imail/nginx;
events {
worker_connections 25000;
use epoll;
multi_accept off;
}
http {
log_format testlogs '$remote_addr - $remote_user [$time_local] ’
'“$request” $status $body_bytes_sent ’
'“$http_referer” “$upstream_addr” ’
‘$request_time $upstream_response_time’;
access_log logs/mos_access.log;
sendfile on;
keepalive_requests 500000;
keepalive_timeout 20s;
tcp_nopush on;
tcp_nodelay on;
client_body_buffer_size 128k;
client_body_temp_path /dev/shm/mos_nginx/client_temp 1 2;
open_file_cache max=25000 inactive=30s;
open_file_cache_valid 60s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
upstream mosgenericbackend {
server mss1:8081;
server mss2:8081;
server mss3:8081;
server mss4:8081;
healthcheck_enabled;
healthcheck_delay 1000;
healthcheck_timeout 1000;
healthcheck_failcount 2;
healthcheck_send ‘GET /mxos/monitor HTTP/1.0’;
keepalive 300;
}
server {
listen 8081;
keepalive_timeout 600s;
client_max_body_size 0;
location /mxos/ {
proxy_http_version 1.1;
proxy_set_header Connection “”;
proxy_pass http://mosgenericbackend;
proxy_connect_timeout 2;
more_set_headers “Content-Type: application/json”;
}
location /mxos/health_status {
healthcheck_status;
}
proxy_connect_timeout 60;
proxy_read_timeout 30;
proxy_send_timeout 60;
}
}

My [vendor patched/supplied due to application specific load
balancing+health check] nginx version:

/opt/imail/nginx/sbin/nginx -V

nginx version: nginx/1.7.4
built by gcc 3.4.6 20060404 (Red Hat 3.4.6-3)
TLS SNI support disabled
configure arguments: --with-debug
–add-module=/bld/current/emailmx90_git_blds/emailmx90_contrib_shared/contrib/nginx_1.7.4/source/addons/healthcheck_nginx_upstreams-master
–add-module=/bld/current/emailmx90_git_blds/emailmx90_contrib_shared/contrib/nginx_1.7.4/source/addons/headers-more-nginx-module-0.23
–add-module=/bld/current/emailmx90_git_blds/emailmx90_contrib_shared/contrib/nginx_1.7.4/source/addons/yaoweibin-nginx_tcp_proxy_module-f2156ef

Posted at Nginx Forum:

The HTTP specification states every request shall receive a response.
Your backend closes the connection while nginx is awaiting/reading the
headers.

The problem definitely comes from your backend. You could use tcpdump
between nginx and your backend to record what they say to each other.
​Try to correlate logs from nginx with logs from your backend.​

B. R.

@B.R. You’re right, seemed my upstream tomcat instances were RESETing
cnx as
reply something.

So far I improved it a lot my altering a http connector keepAliveTimeout
value from mistakenly expressed as sec when in fact it should be msec :wink:

When heavy load it still occurs but far less frequently, will dig deeper
into tomcat trimming/tunning

Posted at Nginx Forum:

stefws Wrote:

Seems I’m not alone w/TC issues :wink:

Posted at Nginx Forum:

Hi.

Am 04-03-2016 11:00, schrieb stefws:

stefws Wrote:

Seems I’m not alone w/TC issues :wink:
missed the link:
http://permalink.gmane.org/gmane.comp.web.haproxy/26860

Well Maybe you have a completely different situation.
Is it possible to build a more recent nginx version without the
additional modules?

Please can you setup the tomcat as described in my answer.

http://permalink.gmane.org/gmane.comp.web.haproxy/26862

Please can you try to run the connector with debug on.

http://tomcat.apache.org/tomcat-8.0-doc/logging.html#Using_java.util.logging_(default)

I would try to use this.

org.apache.catalina.session.level=ALL
org.apache.coyote.http11.Http11Protocol.level=ALL

Pay attention this will produce a lot entries in the logs and could have
some impact to the performance.

The standard setup have also some low limits maybe you must increase
this limits.

http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Standard_Implementation

Maybe I have missed it but which tomcat version do you use?

Please also setup nginx with debug option as described here

http://nginx.org/en/docs/debugging_log.html

What’s the error line(s) on the tomcat site?

Please can you also post the current ‘conf/server.xml’ with all
‘Connector*’, thanks.

BR Aleks

I’ve also seen the issue when running plain nginx 1.9.11 sort like
builting
a new nginx, only the issue is that upstream closes cnxs, not nginx.
Since I’ve discovered that our TCs had the Connector’ keepAliveTimeout
way
low (10 msec), mistakenly thought the units were sec and not as actually
msec.
Since increasing this a 1000 fold everything seems much better now :slight_smile:

<Connector  port="8081" protocol="HTTP/1.1"
            maxConnections="8192" maxThreads="200"
            enableLookups="false" tcpNoDelay="true"
            connectionTimeout="10000" keepAliveTimeout="900000"
            socket.soKeepAlive="true" maxKeepAliveRequests="-1" />

Thanks, will look into your suggestions next time I need to debug TC,
right
now they are running at high throttle :wink:

BTW I’m using Apache Tomcat/7.0.27

tcpdump snippet between nginx(10.45.69.14) and TC(10.45.69.25), where TC
RESETs cnx instead of replying:

01:15:49.744283 IP 10.45.69.14.58702 > 10.45.69.25.8081: Flags [P.], seq
31688:39753, ack 191, win 219, options [nop,nop,TS val 38708511 ecr
32578863], length 8065
/PUT / HTTP/1.1^M
Host: mosgenericbackend^M
Content-Length: 7811^M
Content-Type: application/x-www-form-urlencoded^M
User-Agent: Java/1.8.0_66^M
Accept: text/html, image/gif, image/jpeg, *; q=.2, /; q=.2^M
^M

01:15:49.744311 IP 10.45.69.25.8081 > 10.45.69.14.58702: Flags [R], seq
1376851868, win 0, length 0
01:15:49.744467 IP 10.45.69.14.58702 > 10.45.69.25.8081: Flags [.], ack
192,
win 219, options [nop,nop,TS val 38708512 ecr 32578874], length 0
01:15:49.744480 IP 10.45.69.25.8081 > 10.45.69.14.58702: Flags [R], seq
1376851869, win 0, length 0

Posted at Nginx Forum: