Nginx fails to accept new connection if active worker crashes

luislavena · November 15, 2011, 8:32am

Hi All

I use nginx configured with multiple workers. I also have an nginx
module that crashed due to an error when I noticed that the module crash
leaves nginx in a state where it cannot accept new calls.

Removing my module and killing the “active” worker (the one which seems
to take the new request) with a SIGHUP again caused nginx to hang.
Killing the other worker(s) seem to be working just fine.

Further investigations(with nginx at debug level) showed that all
threads are fine but none of the workers are getting the
ngx_accept_mutex_lock.

Master tries to release the ngx_accept_mutex_lock if the dead process
was holding it [
https://svn.nginx.org/nginx/browser/nginx/trunk/src/os/unix/ngx_process.c?annotate=blame#L503]
but doesnt look like the value is set anywhere.

I have been using nginx only for a couple of months now so I am not very
sure of the diagnosis, please feel free to correct.

uname -a: Linux faskiri-pc 2.6.32-24-generic #43-Ubuntu SMP Thu Sep 16
14:58:24 UTC 2010 x86_64 GNU/Linux

nginx -V: nginx: nginx version: nginx/1.0.5 nginx: built by gcc 4.4.5
(Ubuntu/Linaro 4.4.4-14ubuntu5) nginx: configure arguments:
–without-http_ssi_module --without-http_geo_module
–without-http_fastcgi_module --without-http_uwsgi_module
–without-http_scgi_module --without-http_memcached_module
–without-mail_pop3_module --without-mail_imap_module
–without-mail_smtp_module --with-pcre --with-debug

I will be grateful for any advice.

Best Regards
+Fasih

Posted at Nginx Forum:

faskiri.devel · November 15, 2011, 10:16am

Fasih,

On Nov 15, 2011, at 11:32 AM, faskiri.devel wrote:

Further investigations(with nginx at debug level) showed that all
threads are fine but none of the workers are getting the
ngx_accept_mutex_lock.

Master tries to release the ngx_accept_mutex_lock if the dead process
was holding it [

https://svn.nginx.org/nginx/browser/nginx/trunk/src/os/unix/ngx_process.c?annotate=blame#L503]

but doesnt look like the value is set anywhere.

I have been using nginx only for a couple of months now so I am not very
sure of the diagnosis, please feel free to correct.

Thanks for spotting this one. It’s kind of a known issue and we’re
working on a fix currently. In the meanwhile you can switch accept mutex
off as a workaround (the only downside could potentially be in minor
increase of CPU utilization).

faskiri.devel · November 15, 2011, 12:37pm

Hi Andrew

Thanks for the prompt reply.

As a temporary fix I had created a variable in the shared memory to
track which pid is holding the mutex so that the check in
[https://svn.nginx.org/nginx/browser/nginx/trunk/src/os/unix/ngx_process.c?annotate=blame#L503]
works. It works fine for me, hadnt realized I could switch it off. Are
we talking about putting “accept_mutex off” in the nginx.conf file? It
will be great if you could explain what that will actually do, as in,
how will the new requests be handed off to workers.

Best Regards
+Fasih

Posted at Nginx Forum:

faskiri.devel · November 15, 2011, 2:26pm

On Nov 15, 2011, at 3:37 PM, faskiri.devel wrote:

how will the new requests be handed off to workers.
Yes, accept_mutex off. What accept mutex does is trying to prevent
workers from competing over accept from listening sockets (in the
kernel). In other words, without accept mutex workers may try to
simultaneously check for new events on sockets which may lead to a
slight increase in CPU usage.

Depending on your OS and the event notification mechanisms the results
may vary.

Actually it’s quite safe to try it and we’d appreciate your feedback
here!

And as I mentioned, we’ve been working on fixing the situation with
crashed workers and mutex lock-ups.

faskiri.devel · November 16, 2011, 9:51am

Thanks for your attention!

It works fine with accept_mutex off, I will run my stress test harness
over the weekend to see the impact on the performance. If there is a
significant difference in performance, will surely update the thread
with the same.

For my understanding, I had implemented a workaround to get around this
problem. Is your solution along the same line?

My patch:

diff --git a/service/nginxServer/nginx-1.0.5/src/event/ngx_event.c
b/service/nginxServer/nginx-1.0.5/src/event/ngx_event.c
index c57d37e…a6ed725 100644
— a/service/nginxServer/nginx-1.0.5/src/event/ngx_event.c
+++ b/service/nginxServer/nginx-1.0.5/src/event/ngx_event.c
@@ -49,6 +49,10 @@ ngx_atomic_t *ngx_connection_counter =
&connection_counter;

ngx_atomic_t *ngx_accept_mutex_ptr;
ngx_shmtx_t ngx_accept_mutex;
+// This is shared var protected by ngx_use_accept_mutex. Access only
when
+// ngx_accept_mutex is held. The var stores the PID of the process
currently
+// holding the mutex
+ngx_pid_t *ngx_accept_mutex_held_by;
ngx_uint_t ngx_use_accept_mutex;
ngx_uint_t ngx_accept_events;
ngx_uint_t ngx_accept_mutex_held;
@@ -254,6 +258,7 @@ ngx_process_events_and_timers(ngx_cycle_t *cycle)
}

 if (ngx_accept_mutex_held) {

   *ngx_accept_mutex_held_by = 0;
   ngx_shmtx_unlock(&ngx_accept_mutex);

}

@@ -526,6 +531,9 @@ ngx_event_module_init(ngx_cycle_t *cycle)
{
return NGX_ERROR;
}

// cl = 128 bytes are available for us to use. ngx_shmtx_create
uses
// ngx_atomic_t bytes to assign to mutex->lock, using the memory
after that
ngx_accept_mutex_held_by = (ngx_pid_t*) (shared +
sizeof(ngx_atomic_t));

ngx_connection_counter = (ngx_atomic_t *) (shared + 1 * cl);

diff --git a/service/nginxServer/nginx-1.0.5/src/event/ngx_event.h
b/service/nginxServer/nginx-1.0.5/src/event/ngx_event.h
index 778da52…f1b06d4 100644
— a/service/nginxServer/nginx-1.0.5/src/event/ngx_event.h
+++ b/service/nginxServer/nginx-1.0.5/src/event/ngx_event.h
@@ -501,6 +501,7 @@ extern ngx_atomic_t
*ngx_connection_counter;

extern ngx_atomic_t *ngx_accept_mutex_ptr;
extern ngx_shmtx_t ngx_accept_mutex;
+extern ngx_pid_t *ngx_accept_mutex_held_by;
extern ngx_uint_t ngx_use_accept_mutex;
extern ngx_uint_t ngx_accept_events;
extern ngx_uint_t ngx_accept_mutex_held;
diff --git
a/service/nginxServer/nginx-1.0.5/src/event/ngx_event_accept.c
b/service/nginxServer/nginx-1.0.5/src/event/ngx_event_accept.c
index 2355d1b…feb4568 100644
— a/service/nginxServer/nginx-1.0.5/src/event/ngx_event_accept.c
+++ b/service/nginxServer/nginx-1.0.5/src/event/ngx_event_accept.c
@@ -298,6 +298,10 @@ ngx_trylock_accept_mutex(ngx_cycle_t *cycle)
ngx_log_debug0(NGX_LOG_DEBUG_EVENT, cycle->log, 0,
“accept mutex locked”);

   *ngx_accept_mutex_held_by = ngx_pid;

   // If the mutex was already held by me and we are using

RTSIG_EVENT, no

   // need to enable accept_events

   if (ngx_accept_mutex_held
       && ngx_accept_events == 0
       && !(ngx_event_flags & NGX_USE_RTSIG_EVENT))

@@ -306,6 +310,8 @@ ngx_trylock_accept_mutex(ngx_cycle_t *cycle)
}

     if (ngx_enable_accept_events(cycle) == NGX_ERROR) {

       // No one is holding the mutex now

       *ngx_accept_mutex_held_by = 0;
       ngx_shmtx_unlock(&ngx_accept_mutex);
       return NGX_ERROR;
   }

@@ -317,8 +323,9 @@ ngx_trylock_accept_mutex(ngx_cycle_t *cycle)
}

 ngx_log_debug1(NGX_LOG_DEBUG_EVENT, cycle->log, 0,

              "accept mutex lock failed: %ui",

ngx_accept_mutex_held);

              "accept mutex lock failed: held by: %ui",

*ngx_accept_mutex_held_by);

// If I held it earlier, but not anymore (ngx_trylock_accept_mutex
failed)
if (ngx_accept_mutex_held) {
if (ngx_disable_accept_events(cycle) == NGX_ERROR) {
return NGX_ERROR;
diff --git a/service/nginxServer/nginx-1.0.5/src/os/unix/ngx_process.c
b/service/nginxServer/nginx-1.0.5/src/os/unix/ngx_process.c
index 6055587…b66d4b3 100644
— a/service/nginxServer/nginx-1.0.5/src/os/unix/ngx_process.c
+++ b/service/nginxServer/nginx-1.0.5/src/os/unix/ngx_process.c
@@ -492,17 +492,18 @@ ngx_process_get_status(void)
}

```
   if (ngx_accept_mutex_ptr) {
```
```
       /*
```

        * unlock the accept mutex if the abnormally exited

process

```
        * held it
```
```
        */
```

       ngx_atomic_cmp_set(ngx_accept_mutex_ptr, pid, 0);

   // If the accept mutex is held by the abnormally exited

process

   // Note: If the process holding this has died, the mutex cannot

be

   // acquired by someone else, in which case,

ngx_accept_mutex_held is

```
   // free to be accessed
```

   if (ngx_accept_mutex_held_by != NULL && pid ==

*ngx_accept_mutex_held_by) {

       ngx_log_error(NGX_LOG_INFO, ngx_cycle->log, 0,

           "PID %P held the accept mutex. Releasing", pid);

       // Reset the value before unlocking

```
       *ngx_accept_mutex_held_by = 0;
```

       ngx_shmtx_unlock(&ngx_accept_mutex);
   }

   one = 1;
   process = "unknown process";

–
1.7.1

Posted at Nginx Forum: