Re: Feature request: Run a script when upstream detected down/up

Rt_Ibmer · April 28, 2008, 11:13pm

This sounds like a job for a heartbeat monitor, not a web server.

For our needs this would be best handled by nginx. Here’s why… Nginx
is the first one to know that it considers a server down and has stopped
routing traffic to it until fail_timeout occurs. So regardless of
whether its right and the upstream is really down, or was tripped by a
false positive, the bottom line is that it is now ignoring that upstream
for fail_timeout duration.

Currently nginx is the only one that knows this. So yes I can use
Heartbeat or whatever other monitoring tools are out there. But those
tools can say an upstream is up, or down, but nginx could have the
upstream’s state differently (i.e monitoring could say its up when in
fact it missed a condition that nginx considered the upstream to be down

so the monitoring goes on saying the upstream is fine, while nginx is
treating it as offline - and all the while we have no idea of this).

Bottom line is that it doesn’t make any difference whether a monitoring
script says an upstream server is down or not. What matters is whether
nginx considers it down or not. And for me to know that, nginx needs to
tell me.

The beauty of it is that it seems like quite a trivial yet very useful
function to implement. Basically where ever the code is that decides to
ignore an upstream for fail_timeout, it just needs to call out to some
script to launch it and pass it a param like the name of the upstream
entity that went down. Seems like something that could be done in just
minutes. Unfortunately I’m not a coder or I would take a crack at it.

What happens if you decide to restart the backend process on one of your
upstream servers? Would you still want your script run?

Yes, absolutely. Before I took a server offline I’d tell my script that
nginx calls to ignore reports from upstream xyz - so when the script was
fired off from nginx it would know not to treat it as an alarm
condition.

Thanks for this opportunity to provide feedback. It could be great if
someone from the nginx dev team could comment on whether this is
something they would consider adding. Thank you!!

  ____________________________________________________________________________________

Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now.
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Rt_Ibmer · April 28, 2008, 11:25pm

On Mon, 2008-04-28 at 14:02 -0700, Rt Ibmer wrote:

This sounds like a job for a heartbeat monitor, not a web server.

For our needs this would be best handled by nginx. Here’s why…
Nginx is the first one to know that it considers a server down and has
stopped routing traffic to it until fail_timeout occurs.

Well, it might be, depending on the timing of the heartbeat and
whether/when a particular request causes Nginx to try that backend.

nginx is treating it as offline - and all the while we have no idea of
this).

Bottom line is that it doesn’t make any difference whether a
monitoring script says an upstream server is down or not. What
matters is whether nginx considers it down or not. And for me to know
that, nginx needs to tell me.

But it does. It’s in your error logs. There are alternate loggers that
can even allow you to have scripts run when a regex is matched (metalog
for one). I’ve used metalog successfully to deter brute-force ssh
attacks for example.

Metalog is available in most Linux distros (I’ve used it on Gentoo and
Fedora).

The beauty of it is that it seems like quite a trivial yet very useful
function to implement. Basically where ever the code is that decides
to ignore an upstream for fail_timeout, it just needs to call out to
some script to launch it and pass it a param like the name of the
upstream entity that went down. Seems like something that could be
done in just minutes. Unfortunately I’m not a coder or I would take a
crack at it.

Except that Nginx is asynchronous, not threaded. This means that when
your script is called, Nginx will now be delayed while the script is
launched (and what if the script fails?). You might be able to work
around this, but I suspect it won’t be as trivial as you might hope.

Regards,
Cliff

Rt_Ibmer · April 28, 2008, 11:47pm

Cliff W. ha scritto:

your script is called, Nginx will now be delayed while the script is
launched (and what if the script fails?).

Nginx will just have to wait until fork returns.

As you say, there no “immediate” way to know if the script fails, but
Nginx installs a SIGCHLD handler, so you will get NOTICE messages in the
log file: "%unknown process exited with code "

For the OP, executing a script when a backend fails is trivial, it
should be possibile to do it just be adding a new module.

However executing a script when a backend became alive again is not
easy, as far as I know.

You might be able to work
around this, but I suspect it won’t be as trivial as you might hope.

Regards,
Cliff

Regards Manlio P.