Feature request: Run a script when upstream detected down/up

Rt_Ibmer · April 28, 2008, 5:56pm

I am using Amazon EC2 with nginx primarily for load balancing and
proxying requests using proxy_pass. In this EC2 environment upstream
servers can fail without warning.

It would be a fantastic edition to nginx to have it run a backend shell
script when it detects that it needs to skip over an upstream server
because it is not responding in accordance with proxy_next_upstream.

For instance, if I’ve set this:

proxy_connect_timeout 3;
proxy_send_timeout 3;
proxy_read_timeout 3;
proxy_next_upstream error timeout http_500 http_503;

Than at any point when nginx decides it must skip a particular upstream
server (based on the above config), then I’d want it to run my bash
script /etc/scripts/upstream_down. Likewise anytime it detects that an
upstream server previously being skipped is now back and available, I’d
want it to call another script /etc/scripts/upstream_isback. I would
only want it to call my upstream_down the first time it goes down and
not each time it retries (unless nginx detects it has come back up, and
then later goes down again).

Or perhaps it can be simplified so that there is one command it will
execute when an upstream server changes from down to up or vice-versa,
and nginix then passes to it parameters to indicate whether the status
has changed to down or up.

In either case I’d like to get parameters that tell me what the server
name or IP is (as defined in the upstream config following the “server”
statement) and perhaps what the reason is (i.e. if the upstream server
was down due to “http_500” vs “timeout” etc).

With this feature added, I would set up my script so that it would:

a) send me an email notification warning me that upstream server xyz

just went down (or that xyz just came back up)
b) and, in the case where a box went down, my script may then
automatically use the appropriate EC2 command to immediately launch a
new ec2 replacement instance!

I am interested to hear from others if they would find it useful to call
a script upon status change of upstream servers. Likewise I am curious
whether this would be a relatively easy feature to implement. At first
I was going to suggest having the email notification built into nginx as
a feature, but then thought it would be much more useful just to have it
kick off a script where we could send the email ourselves and do other
actions as well (such as launch replacement instances).

Thanks!!

  ____________________________________________________________________________________

Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now.
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Rt_Ibmer · April 28, 2008, 7:31pm

On Mon, 2008-04-28 at 08:44 -0700, Rt Ibmer wrote:

I am using Amazon EC2 with nginx primarily for load balancing and
proxying requests using proxy_pass. In this EC2 environment upstream
servers can fail without warning.

It would be a fantastic edition to nginx to have it run a backend
shell script when it detects that it needs to skip over an upstream
server because it is not responding in accordance with
proxy_next_upstream.

This sounds like a job for a heartbeat monitor, not a web server. What
happens if you decide to restart the backend process on one of your
upstream servers? Would you still want your script run?

Regards,
Cliff

Rt_Ibmer · April 28, 2008, 9:07pm

On 4/28/08, Cliff W. [email protected] wrote:

This sounds like a job for a heartbeat monitor, not a web server.

Yup. People (such as myself) are using nginx for reverse proxy + load
balancing. It works great except for when upstreams go up and down, or
are up but should fail healthcheck (perhaps the filesystem is stale,
or corrupt, or something else - some sort of simple challenge response
type thing would be able to validate that)

It doesn’t do “smart enough” load balancing, but it wasn’t designed
for that. Although, if someone wanted to add a couple patches to make
nginx work like that, I would not be upset

Rt_Ibmer · May 1, 2008, 3:58am

hello,

i have a small suggestion to make (slightly off topic):

i think this problem would be best solvable by having nginx capable of
logging to a file-type other than a regular file.
for instance, if the nginx error_log directive
(http://wiki.codemongers.com/NginxMainModule#error_log) could support a
TCP/IP or a unix domain socket or a named pipe, then interesting
programs can be written around it which may be an easy approach:

a) for instance, having an external program monitor nginx logs for a
particular log message (or “event”), is not so easy when the file in
question is a regular file. the reason for that is, regular files are
always ready to read on linux (even if the FD is at EOF), so you can’t
use an open file descriptor to a regular file in a select()/poll()
system call and expect it to work the way you want. (i.e. ready-to-read)
it could be done (tail -f does that), but it’s rather messy to have a
file-change notification, plus seeking to the appropriate offset. –
with sockets/fifos, these problems can go away. you can set up a socket
server to listen on the given socket for connections, and when nginx
starts up, the error log initialization code can connect to the tcp or
unix domain socket or fifo in question, and then all calls to
nginx_log_* will naturally be written to the socket, instead of written
to a regular file

in fact, if one adopts the syntax of socat, we might have an nginx conf
that looks like:

error_log TCP4:localhost:12345
-or-
error_log UNIX:/path/to/socket

nginx_log_ functions (and the output log format) is structured enough
that you can parse the output of that (possibly designate an entire
“event.” namespace to indicate interesting events, such as upstream went
down, memcache server went down, imap connection timed out, etc)

b) with this approach, the problem of counting the number of times an
event was fired in a particular time, is solved (because the socket
server will receive data in real time, and you can always aggregate
similar events, and fork appropriate scripts if you want to).

c) also, if this event approach is taken further, this can also be used
as a non-intrusive way of profiling nginx (measuring response times,
measuring # of simultaneous connections, etc)

d) and it is also purely in the spirit of a webserver – apache, if you
recollect, can pipe its access log to a program – that’s messy for
nginx, but i think, IMHO, that the sockets approach would be better –
this way, nginx need not even be blocked on disk I/O in extreme cases –
although the health of the TCP connection is also relevant here, and the
socket server needs to be able to handle large volumes of nginx logs in
a short period of time.

so if herr igor thinks it’s ok, i think without too much change, this
can be coded

regards,
mansoor peerbhoy

ps: having nginx execute hooks directly, is, imho not a very good idea.
sure it can be done, but it’s not the spirit of nginx

----- Original Message -----
From: “FranÃ§ois Battail” [email protected]
To: [email protected]
Sent: Wednesday, April 30, 2008 11:50:42 AM GMT +05:30 Chennai, Kolkata,
Mumbai, New Delhi
Subject: Re: Feature request: Run a script when upstream detected
down/up

Aleksandar L. <al-nginx@…> writes:

I think for this the embedded script language
(perl/neko/lua/python/your_preferd_lang will be very helpfull

I’m not too sure. If for writing some bytes on memory we need to
instantiate
an interpretor or a VM and after writing these bytes running a garbage
collector, there’s something wrong.

Brainstorming:

1.) add hook into upstream module, maybe some other modules also
2.) use this hook in $EMBEDDED_SCRIPT (current only perl) to write into
$OUTPUT_THING (file/shm/…)
3.) use external program which monitor the $OUTPUT_THING

The benefit to use a embedded script language is that the user can write
some infos into the output string and some predefined variables from
nginx.

Yes, but it is not the idea. It’s about monitoring indicators like:

counters on connections state (same as sub_status module)
upstream servers statuses
error counters (like out of fd…)
min max avg on resource allocation
…

That way it will be possible to have useful information for tuning or
monitoring
servers running Nginx in an efficient way. It’s not designed to
help debugging an application.

Best regards.

Rt_Ibmer · May 1, 2008, 6:08am

Mansoor P. <mansoor@…> writes:

a) for instance, having an external program monitor nginx logs for a
particular log message (or “event”), […]

OK, that’s monitoring but still require lot of system calls and even
bandwidth.

b) with this approach, the problem of counting the number of times an event
was fired in a particular time […]

True, but it’s no more monitoring but debugging.

c) also, if this event approach is taken further, this can also be used as a
non-intrusive way of profiling

True, but it’s no more monitoring but profiling.

d) and it is also purely in the spirit of a webserver – apache, if you
recollect, can pipe its access log to a program […]

It’s Unix philosophy: “everything is a file” but sometimes a more
pragmatic
approach is more efficient. Redirecting logs to another server or
program can be
useful but most likely to do heavy statistical computations or intrusion
tentative detection or legal access log archivage.

What’s the purpose of an error log? To help to traceback an incident.
What’s the purpose of monitoring? To told you there is an incident and
to give
you valuable data to understand what kind of incident it is.

It’s not the same thing, so it’s not the same tools.

At this time, the few things monitored by Nginx are provided by the
stub_status
module. How it works: you make an HTTP GET on a specific loation and it
will
returns some variable values in text form.

What I propose is a more general mechanism which is to use shared memory
instead
of an HTTP request. As most scripting languages offer the access to
shared
memory you can do what you want after.

How it should work?

A Nginx module wanting to have some variables monitored should register
these
during init, something like:

ngx_monitored_value_t * ngx_register_monitoring_value (ngx_string_t *
name) ;

Nginx will then reserve an area in the shared memory to store:

“:XXXXXXXX\n”

After startup the module will modify the value using the atomic_t
primitives.
And at the end of the main worker cycle for each variable monitored we
simply do
a loop to “snprintf” the value into the area in shared memory.

Nginx is of course write only for security reason.

Then you will be able to exploit the data, something like this:

accept:00000002
read:000000001
write:000000001
wait:000000001
mybackendserver1status:00000000
mybackendserver2status:00000001
…

How it costs? The cost of modifying under mutex or spinlock the
variables
(already the case for the sub_status module) and the cost, for each
variable, to
do an sprintf to transform an integer into its hexadecimal
representation. Say
100 assembly instructions each time the main worker do a cycle, it’s
nothing.
I’m sure it’s way faster than sub_status_module

It’s easy, simple, fast and useful (at least for one person). The
sub_status
module is not affected only the Nagios and Collectd plugins need to be
upgraded
to exploit the shared memory, not a big deal I think.

Best regards.

Rt_Ibmer · May 1, 2008, 7:17am

Manlio,

Yes, I see what you’re saying.
It is always going to be a problem if the listener cannot consume the
data fast enough.
In any case, if you really dig deep, you can see that NGINX is asynch in
that it always reads from a socket only when it is ready to be read
(select/poll/epoll/kevent etc fires).
But, as you know, NGINX uses non-blocking sockets.
So when it wants to write to a socket (as far as my limited
understanding goes), it doesn’t wait for it to be ready-to-write. It
just goes and does a write to the socket. If the kernel deems that the
write would have blocked, it will return EAGAIN (see send(2)) , rather
than block the caller, if that’s what you’re worried about.

Yes, with the speed and volume of debugging calls, it is likely that
NGINX may need a memory buffer to hold the debug logs till the socket
can be written to, but then, I guess that is overkill.

As someone pointed out, logging is one thing, profiling is quite
another, and monitoring is yet something else. Different needs.

But think about it. Agreed that NGINX is astonishingly fast in every
aspect of the game, but the ability to log to a socket can, at the very
least, provide centralized logging in cases where maybe several nginx
proxies sit behind a load balancer. The value of such a centralized log,
IMHO can be a very good argument for extending the support to log to a
socket.

And it isn’t as if NGINX needs to be configured by default to write to a
socket. I’m sure if ppl wish to configure it that way, they should
understand that NGINX should expect the log-consumer process to be fast
enough.

Also, again off topic, what’s NGINX performance when log files reside on
an NFS (or CIFS, etc) partition ? Has anyone tested that ? I should
imagine that the reliability of the network will affect NGINX ngx_log_*
times

Regards,
Mansoor

----- Original Message -----
From: “Manlio P.” [email protected]
To: [email protected]
Sent: Wednesday, April 30, 2008 3:07:07 PM GMT +05:30 Chennai, Kolkata,
Mumbai, New Delhi
Subject: Re: Feature request: Run a script when upstream detected
down/up

Mansoor P. ha scritto:

hello,

i have a small suggestion to make (slightly off topic):

i think this problem would be best solvable by having nginx capable of logging to a file-type other than a regular file.
for instance, if the nginx error_log directive (http://wiki.codemongers.com/NginxMainModule#error_log) could support a TCP/IP or a unix domain socket or a named pipe, then interesting programs can be written around it which may be an easy approach:

a) for instance, having an external program monitor nginx logs for a particular log message (or “event”), is not so easy when the file in question is a regular file. the reason for that is, regular files are always ready to read on linux (even if the FD is at EOF), so you can’t use an open file descriptor to a regular file in a select()/poll() system call and expect it to work the way you want. (i.e. ready-to-read)
it could be done (tail -f does that), but it’s rather messy to have a file-change notification, plus seeking to the appropriate offset. – with sockets/fifos, these problems can go away. you can set up a socket server to listen on the given socket for connections, and when nginx starts up, the error log initialization code can connect to the tcp or unix domain socket or fifo in question, and then all calls to nginx_log_* will naturally be written to the socket, instead of written to a regular file

This is not as easy as it seems.
First of all with the current architecture of Nginx, writing to the
error log is assumed to be synchronous.

This means that if you want to send log messages to a TCP server the
performance will be bad.

You can use an UDP connection, but what happens if Nginx sends data more
quickly then the server is able to read?
UDP has no flow control.

[…]

Regards Manlio P.

Rt_Ibmer · May 1, 2008, 7:30am

On Wed, Apr 30, 2008 at 05:03:53AM -0700, Mansoor P. wrote:

Manlio,

Yes, I see what you’re saying.
It is always going to be a problem if the listener cannot consume the data fast enough.
In any case, if you really dig deep, you can see that NGINX is asynch in that it always reads from a socket only when it is ready to be read (select/poll/epoll/kevent etc fires).
But, as you know, NGINX uses non-blocking sockets.
So when it wants to write to a socket (as far as my limited understanding goes), it doesn’t wait for it to be ready-to-write. It just goes and does a write to the socket. If the kernel deems that the write would have blocked, it will return EAGAIN (see send(2)) , rather than block the caller, if that’s what you’re worried about.

The problem arises when nginx produces log messages faster than the
logger can consume them. You can then:

drop messages
buffer them until the consumer catches up
block writes

None of these options is particularly nice, but if I were to implement
an external log consumer for nginx, I’d slap a traditional syslog
(514/udp) interface and be done with it, lost messages be damned (maybe
just use a bigger send buffer for the socket).

Note that if you buffer the log messages inside nginx, the buffer may
actually grow infinitely large.

You can use an UDP connection, but what happens if Nginx sends data more
quickly then the server is able to read?
UDP has no flow control.

QFT.

Best regards,
Grzegorz N.

Rt_Ibmer · May 1, 2008, 11:50am

Grzegorz N. ha scritto:

logger can consume them. You can then:

drop messages

buffer them until the consumer catches up

block writes

None of these options is particularly nice, but if I were to implement
an external log consumer for nginx, I’d slap a traditional syslog
(514/udp) interface and be done with it, lost messages be damned (maybe
just use a bigger send buffer for the socket).

The RFC 3164 requires that each packet MUST not be greater then 1024
bytes.

Note that if you buffer the log messages inside nginx, the buffer may
actually grow infinitely large.

Regards Manlio P.

Rt_Ibmer · May 1, 2008, 4:50am

Mansoor P. ha scritto:

hello,

i have a small suggestion to make (slightly off topic):

i think this problem would be best solvable by having nginx capable of logging to a file-type other than a regular file.
for instance, if the nginx error_log directive (http://wiki.codemongers.com/NginxMainModule#error_log) could support a TCP/IP or a unix domain socket or a named pipe, then interesting programs can be written around it which may be an easy approach:

a) for instance, having an external program monitor nginx logs for a particular log message (or “event”), is not so easy when the file in question is a regular file. the reason for that is, regular files are always ready to read on linux (even if the FD is at EOF), so you can’t use an open file descriptor to a regular file in a select()/poll() system call and expect it to work the way you want. (i.e. ready-to-read)
it could be done (tail -f does that), but it’s rather messy to have a file-change notification, plus seeking to the appropriate offset. – with sockets/fifos, these problems can go away. you can set up a socket server to listen on the given socket for connections, and when nginx starts up, the error log initialization code can connect to the tcp or unix domain socket or fifo in question, and then all calls to nginx_log_* will naturally be written to the socket, instead of written to a regular file

This is not as easy as it seems.
First of all with the current architecture of Nginx, writing to the
error log is assumed to be synchronous.

This means that if you want to send log messages to a TCP server the
performance will be bad.

You can use an UDP connection, but what happens if Nginx sends data more
quickly then the server is able to read?
UDP has no flow control.

[…]

Regards Manlio P.

Rt_Ibmer · May 1, 2008, 12:58pm

On Wed, 2008-04-30 at 05:03 -0700, Mansoor P. wrote:

Manlio,

Yes, I see what you’re saying.
It is always going to be a problem if the listener cannot consume the
data fast enough.

I think if Nginx only updated counters (similar to OpenVZ’s bean
counters) this would not be an issue and would work well with the shared
memory/virtual file approach (and fits better with the idea of event
monitoring vs logging/post-mortem). Rather than a stream of data you
have a “file” of fixed size that’s constantly updated. It also
simplifies the parsing of the data by an external application.

If a virtual file system (like /proc) were used, then it should also be
possible to use inotify (or similar) from external apps to be notified
of events (although this might be too busy).

Regards,
Cliff

Rt_Ibmer · May 1, 2008, 2:29pm

Cliff W. ha scritto:

monitoring vs logging/post-mortem). Rather than a stream of data you
have a “file” of fixed size that’s constantly updated. It also
simplifies the parsing of the data by an external application.

There is a potential problem with this solution.
To be able to write atomically all this information, Nginx have to use a
spin lock to lock the memory region.

However the default implementation in Nginx uses an atomical variable
visible only by Nginx.

This means that an external process have no way to sinchronize memory
access with Nginx, and it may read inconsistent values.

Moreover if the external process can synchronize with Nginx there is the
risk that this external process will keep the lock for too much time,
thus blocking Nginx.

The solution is to only store atomic_t values in the shared memory, so
that it can atomically updated.

As an example
/tmp/nginx/01
/tmp/nginx/01/address
/tmp/nginx/01/name
/tmp/nginx/01/tries
/tmp/nginx/01/down

where 01 is the upstream peer number, and only the tries and down
files will be updated (and they will contain an integer value in binary
format).

If a virtual file system (like /proc) were used, then it should also be
possible to use inotify (or similar) from external apps to be notified
of events (although this might be too busy).

This should work with a normal filesystem too.

Regards,
Cliff

Regards Manlio P.

Rt_Ibmer · May 1, 2008, 9:09pm

Grzegorz N. ha scritto:

On Thu, May 01, 2008 at 11:42:39AM +0200, Manlio P. wrote:

The RFC 3164 requires that each packet MUST not be greater then 1024 bytes.

But you might have an unspecified numer of those in the socket buffer
(again, when the consumer cannot keep up).

You are right, I have confused the socket send buffer with the MTU :).
Sorry.

Best regards,
Grzegorz N.

Regards Manlio P.

Rt_Ibmer · May 1, 2008, 8:38pm

On Thu, May 01, 2008 at 11:42:39AM +0200, Manlio P. wrote:

The RFC 3164 requires that each packet MUST not be greater then 1024 bytes.

But you might have an unspecified numer of those in the socket buffer
(again, when the consumer cannot keep up).

Best regards,
Grzegorz N.