499 error

I have a configuration that uses nginx + unicorn + rails 3.1. I have a
load balancer in front of this, and the health checks are failing
because nginx is sending a 499 response when I try to use a rails page
for the health check. I can see that the rails response was a 200 from
the logs, but nginx turns this into a 499. I understand that this means
the client closed the connection, but its idle timeout is 60 seconds,
and the rails log shows consistent sub-1 second response times. There’s
a second, more frequent health check that just opens a TCP connection on
port 80 and only validates that the connection was opened. When I use a
static page (either from rails public directory or from another
directory), everything’s fine. I cannot figure out why nginx is sending
this 499 response – one page indicated it was because of not having the
host in the request
(http://www.docunext.com/blog/2009/09/nginx-499-error.html), but I set
the host in my nginx config to a hardcoded hostname, and still no luck.
I am very confused. I can paste config files if that helps.

Here’s a look at the logs, both from rails and from nginx:

Rails (the Host: X.X.X.X line is added from the controller for debug
purposes and is actually the IP address of the server):

Started GET “/health_check” for 127.0.0.1 at Thu Dec 01 08:11:05 +0000
2011
Processing by HealthChecksController#do_health_check as /
^[[1m^[[35mHealthCheck Load (0.1ms)^[[0m SELECT “health_checks”.*
FROM “health_checks” LIMIT 1
^[[1m^[[36m (0.2ms)^[[0m ^[[1mUPDATE “health_checks” SET “curr” =
‘2011-12-01 08:11:06.000000’, “prev” = ‘2011-12-01 08:10:35.000000’,
“updated_at” = ‘2011-12-01 08:11:06.019519’, “check_count” = 157 WHERE
“health_checks”.“id” = 2^[[0m
Host: X.X.X.X
Rendered health_checks/do_health_check.html.erb within
layouts/application (0.4ms)
Completed 200 OK in 70ms (Views: 25.9ms | ActiveRecord: 0.7ms)

Started GET “/health_check” for 127.0.0.1 at Thu Dec 01 08:11:35 +0000
2011
Processing by HealthChecksController#do_health_check as /
^[[1m^[[35mHealthCheck Load (0.1ms)^[[0m SELECT “health_checks”.*
FROM “health_checks” LIMIT 1
^[[1m^[[36m (0.2ms)^[[0m ^[[1mUPDATE “health_checks” SET “prev” =
‘2011-12-01 08:11:06.000000’, “check_count” = 158, “curr” = ‘2011-12-01
08:11:35.000000’, “updated_at” = ‘2011-12-01 08:11:35.926096’ WHERE
“health_checks”.“id” = 2^[[0m
Host: X.X.X.X
Rendered health_checks/do_health_check.html.erb within
layouts/application (0.4ms)
Completed 200 OK in 36ms (Views: 12.9ms | ActiveRecord: 0.6ms)

nginx:

X.X.X.X - - [01/Dec/2011:07:57:05 +0000] “GET /health_check HTTP/1.1”
499 0 “-” “HealthChecker/1.0”
X.X.X.X - - [01/Dec/2011:07:57:15 +0000] “-” 400 0 “-” “-”
X.X.X.X - - [01/Dec/2011:07:57:25 +0000] “-” 400 0 “-” “-”
X.X.X.X - - [01/Dec/2011:07:57:35 +0000] “-” 400 0 “-” “-”
X.X.X.X - - [01/Dec/2011:07:57:35 +0000] “GET /health_check HTTP/1.1”
499 0 “-” “HealthChecker/1.0”

Posted at Nginx Forum:

Hello!

On Thu, Dec 01, 2011 at 03:20:12AM -0500, lsdillard wrote:

directory), everything’s fine. I cannot figure out why nginx is sending
this 499 response – one page indicated it was because of not having the
host in the request

The 499 response is never send, it’s just as status for logs to
show that client closed connection before nginx was able to reply
anything.

You description suggests that load balancer uses half-closed tcp
connection for health checks (i.e. shuts down it’s part of a tcp
connection after sending request). This will result in
499 for proxied requests, as nginx will detect connection close
from client and will terminate request. You may use

proxy_ignore_client_abort on;
# or fastcgi_ignore_client_abort, whatever you use

to prevent nginx from doing connection close checks. It will stop
it from detecting connection close from clients as well though, so
you may want to limit this setting to health checks only by adding
separate location for health checks.

Alternatively, you may want to fix your load balancer to keep
connection open, not half-closed.

Maxim D.

Thanks – that addressed my issue for now by setting
proxy_ignore_client_abort on. Will see what I can do for other aspects
of the issue, but it got me past the problem.

Posted at Nginx Forum: