Nginx fails to accept new connection if active worker crashes

Hi All

I use nginx configured with multiple workers. I also have an nginx
module that crashed due to an error when I noticed that the module crash
leaves nginx in a state where it cannot accept new calls.

Removing my module and killing the “active” worker (the one which seems
to take the new request) with a SIGHUP again caused nginx to hang.
Killing the other worker(s) seem to be working just fine.

Further investigations(with nginx at debug level) showed that all
threads are fine but none of the workers are getting the
ngx_accept_mutex_lock.

Master tries to release the ngx_accept_mutex_lock if the dead process
was holding it [
https://svn.nginx.org/nginx/browser/nginx/trunk/src/os/unix/ngx_process.c?annotate=blame#L503]
but doesnt look like the value is set anywhere.

I have been using nginx only for a couple of months now so I am not very
sure of the diagnosis, please feel free to correct.

uname -a: Linux faskiri-pc 2.6.32-24-generic #43-Ubuntu SMP Thu Sep 16
14:58:24 UTC 2010 x86_64 GNU/Linux

nginx -V: nginx: nginx version: nginx/1.0.5 nginx: built by gcc 4.4.5
(Ubuntu/Linaro 4.4.4-14ubuntu5) nginx: configure arguments:
–without-http_ssi_module --without-http_geo_module
–without-http_fastcgi_module --without-http_uwsgi_module
–without-http_scgi_module --without-http_memcached_module
–without-mail_pop3_module --without-mail_imap_module
–without-mail_smtp_module --with-pcre --with-debug

I will be grateful for any advice.

Best Regards
+Fasih

Posted at Nginx Forum:

Fasih,

On Nov 15, 2011, at 11:32 AM, faskiri.devel wrote:

Further investigations(with nginx at debug level) showed that all
threads are fine but none of the workers are getting the
ngx_accept_mutex_lock.

Master tries to release the ngx_accept_mutex_lock if the dead process
was holding it [

https://svn.nginx.org/nginx/browser/nginx/trunk/src/os/unix/ngx_process.c?annotate=blame#L503]

but doesnt look like the value is set anywhere.

I have been using nginx only for a couple of months now so I am not very
sure of the diagnosis, please feel free to correct.

Thanks for spotting this one. It’s kind of a known issue and we’re
working on a fix currently. In the meanwhile you can switch accept mutex
off as a workaround (the only downside could potentially be in minor
increase of CPU utilization).

Hi Andrew

Thanks for the prompt reply.

As a temporary fix I had created a variable in the shared memory to
track which pid is holding the mutex so that the check in
[https://svn.nginx.org/nginx/browser/nginx/trunk/src/os/unix/ngx_process.c?annotate=blame#L503]
works. It works fine for me, hadnt realized I could switch it off. Are
we talking about putting “accept_mutex off” in the nginx.conf file? It
will be great if you could explain what that will actually do, as in,
how will the new requests be handed off to workers.

Best Regards
+Fasih

Posted at Nginx Forum:

On Nov 15, 2011, at 3:37 PM, faskiri.devel wrote:

how will the new requests be handed off to workers.
Yes, accept_mutex off. What accept mutex does is trying to prevent
workers from competing over accept from listening sockets (in the
kernel). In other words, without accept mutex workers may try to
simultaneously check for new events on sockets which may lead to a
slight increase in CPU usage.

Depending on your OS and the event notification mechanisms the results
may vary.

Actually it’s quite safe to try it and we’d appreciate your feedback
here!

And as I mentioned, we’ve been working on fixing the situation with
crashed workers and mutex lock-ups.

Thanks for your attention!

It works fine with accept_mutex off, I will run my stress test harness
over the weekend to see the impact on the performance. If there is a
significant difference in performance, will surely update the thread
with the same.

For my understanding, I had implemented a workaround to get around this
problem. Is your solution along the same line?

My patch:

diff --git a/service/nginxServer/nginx-1.0.5/src/event/ngx_event.c
b/service/nginxServer/nginx-1.0.5/src/event/ngx_event.c
index c57d37e…a6ed725 100644
— a/service/nginxServer/nginx-1.0.5/src/event/ngx_event.c
+++ b/service/nginxServer/nginx-1.0.5/src/event/ngx_event.c
@@ -49,6 +49,10 @@ ngx_atomic_t *ngx_connection_counter =
&connection_counter;

ngx_atomic_t *ngx_accept_mutex_ptr;
ngx_shmtx_t ngx_accept_mutex;
+// This is shared var protected by ngx_use_accept_mutex. Access only
when
+// ngx_accept_mutex is held. The var stores the PID of the process
currently
+// holding the mutex
+ngx_pid_t *ngx_accept_mutex_held_by;
ngx_uint_t ngx_use_accept_mutex;
ngx_uint_t ngx_accept_events;
ngx_uint_t ngx_accept_mutex_held;
@@ -254,6 +258,7 @@ ngx_process_events_and_timers(ngx_cycle_t *cycle)
}

 if (ngx_accept_mutex_held) {
  •    *ngx_accept_mutex_held_by = 0;
       ngx_shmtx_unlock(&ngx_accept_mutex);
    
    }

@@ -526,6 +531,9 @@ ngx_event_module_init(ngx_cycle_t *cycle)
{
return NGX_ERROR;
}

  • // cl = 128 bytes are available for us to use. ngx_shmtx_create
    uses

  • // ngx_atomic_t bytes to assign to mutex->lock, using the memory
    after that

  • ngx_accept_mutex_held_by = (ngx_pid_t*) (shared +
    sizeof(ngx_atomic_t));

    ngx_connection_counter = (ngx_atomic_t *) (shared + 1 * cl);

diff --git a/service/nginxServer/nginx-1.0.5/src/event/ngx_event.h
b/service/nginxServer/nginx-1.0.5/src/event/ngx_event.h
index 778da52…f1b06d4 100644
— a/service/nginxServer/nginx-1.0.5/src/event/ngx_event.h
+++ b/service/nginxServer/nginx-1.0.5/src/event/ngx_event.h
@@ -501,6 +501,7 @@ extern ngx_atomic_t
*ngx_connection_counter;

extern ngx_atomic_t *ngx_accept_mutex_ptr;
extern ngx_shmtx_t ngx_accept_mutex;
+extern ngx_pid_t *ngx_accept_mutex_held_by;
extern ngx_uint_t ngx_use_accept_mutex;
extern ngx_uint_t ngx_accept_events;
extern ngx_uint_t ngx_accept_mutex_held;
diff --git
a/service/nginxServer/nginx-1.0.5/src/event/ngx_event_accept.c
b/service/nginxServer/nginx-1.0.5/src/event/ngx_event_accept.c
index 2355d1b…feb4568 100644
— a/service/nginxServer/nginx-1.0.5/src/event/ngx_event_accept.c
+++ b/service/nginxServer/nginx-1.0.5/src/event/ngx_event_accept.c
@@ -298,6 +298,10 @@ ngx_trylock_accept_mutex(ngx_cycle_t *cycle)
ngx_log_debug0(NGX_LOG_DEBUG_EVENT, cycle->log, 0,
“accept mutex locked”);

  •    *ngx_accept_mutex_held_by = ngx_pid;
    
  •    // If the mutex was already held by me and we are using
    

RTSIG_EVENT, no

  •    // need to enable accept_events
    
       if (ngx_accept_mutex_held
           && ngx_accept_events == 0
           && !(ngx_event_flags & NGX_USE_RTSIG_EVENT))
    

@@ -306,6 +310,8 @@ ngx_trylock_accept_mutex(ngx_cycle_t *cycle)
}

     if (ngx_enable_accept_events(cycle) == NGX_ERROR) {
  •        // No one is holding the mutex now
    
  •        *ngx_accept_mutex_held_by = 0;
           ngx_shmtx_unlock(&ngx_accept_mutex);
           return NGX_ERROR;
       }
    

@@ -317,8 +323,9 @@ ngx_trylock_accept_mutex(ngx_cycle_t *cycle)
}

 ngx_log_debug1(NGX_LOG_DEBUG_EVENT, cycle->log, 0,
  •               "accept mutex lock failed: %ui",
    

ngx_accept_mutex_held);

  •               "accept mutex lock failed: held by: %ui",
    

*ngx_accept_mutex_held_by);

  • // If I held it earlier, but not anymore (ngx_trylock_accept_mutex
    failed)
    if (ngx_accept_mutex_held) {
    if (ngx_disable_accept_events(cycle) == NGX_ERROR) {
    return NGX_ERROR;
    diff --git a/service/nginxServer/nginx-1.0.5/src/os/unix/ngx_process.c
    b/service/nginxServer/nginx-1.0.5/src/os/unix/ngx_process.c
    index 6055587…b66d4b3 100644
    — a/service/nginxServer/nginx-1.0.5/src/os/unix/ngx_process.c
    +++ b/service/nginxServer/nginx-1.0.5/src/os/unix/ngx_process.c
    @@ -492,17 +492,18 @@ ngx_process_get_status(void)
    }
  •    if (ngx_accept_mutex_ptr) {
    
  •        /*
    
  •         * unlock the accept mutex if the abnormally exited
    

process

  •         * held it
    
  •         */
    
  •        ngx_atomic_cmp_set(ngx_accept_mutex_ptr, pid, 0);
    
  •    // If the accept mutex is held by the abnormally exited
    

process

  •    // Note: If the process holding this has died, the mutex cannot
    

be

  •    // acquired by someone else, in which case,
    

ngx_accept_mutex_held is

  •    // free to be accessed
    
  •    if (ngx_accept_mutex_held_by != NULL && pid ==
    

*ngx_accept_mutex_held_by) {

  •        ngx_log_error(NGX_LOG_INFO, ngx_cycle->log, 0,
    
  •            "PID %P held the accept mutex. Releasing", pid);
    
  •        // Reset the value before unlocking
    
  •        *ngx_accept_mutex_held_by = 0;
    
  •        ngx_shmtx_unlock(&ngx_accept_mutex);
       }
    
  •    one = 1;
       process = "unknown process";
    


1.7.1

Posted at Nginx Forum: