IMHO, this is overkill. It’s really neat, but I don’t think you need
to do this at all. We host lots of rails apps and don’t run into
problems that require that kind of approach. You’ll get error log
I’m confused. You previously said:
Won’t this have the downside of possibly sending multiple failing
requests to the upstreams? We used this for a while but ran into
problems with duplicate requests. For example we had people sending
WAY too many mails out in an request, the appserver would timeout
halfway through, it’d send a portion of the emails, and then send the
request to another upstream. The subsequent requests would do the
same thing and people would get the same email for every upstream
defined.
I then looked at the docs.
http://wiki.codemongers.com/NginxHttpProxyModule#proxy_next_upstream
error - an error has occurred while connecting to the server, sending a
request to it, or reading its response;
timeout - occurred timeout during the connection with the server,
transfer
the requst or while reading response from the server;
And assumed that a “timeout” was a subset of “error”. Is that right or
wrong
then? If I do:
proxy_next_upstream error;
And one of my connections times out, will nginx send the request to the
next
backend or not? If it does, then that’s a problem because it can launch
the
same “slow” action to occur multiple times on multiple servers. It means
that we do need a “connect_error” option so we can just say:
proxy_next_upstream connect_error;
If not, then we’re all ok, we can just use the “error” option.
Anyway, having said all that, we still do need our solution for some
annoying edge cases. Basically systems can crash in very, very odd ways.
It’s been a while (I think it was linux 2.6.18), but we had a system
crash
in a state where it would accept TCP connections, but wasn’t responding
to
them in any way. That was quite nasty because basically it meant
connections
coming in to that server would have to wait the full proxy_read_timeout
before being passed to the next backend server. Since the server was
remote,
it took a little while to get it rebooted at the co-location facility.
Fortunately, because of our above scheme, and the fact we remotely check
each server every 2 minutes, when that server failed to pass it’s “ping”
test after 30 seconds, it was marked down in the database, and was
automatically taken out of service without intervention required by us.
Rob