Call ngx_localtime may cause deadlock?

I am writing a module based on nginx 0.7.65, ngx_localtime() is called
in the module.
Problem is, if set “timer_resolution 100ms;” in config file, nginx
workers(*4) will hangs randomly. then if “timer_resolution” command was
turned off , everything works just fine.

here is the backtrace (coredump generated with gcore(1) on a hanged
worker process):

#0 0x007347a2 in ?? ()
#1 0x00823a9e in pthread_cond_init@@GLIBC_2.3.2 () from
/lib/tls/libc.so.6
#2 0x007c942d in __wcsmbs_named_conv () from /lib/tls/libc.so.6
#3 0x080611c6 in ngx_localtime (s=1267569890, tm=0x9ba2048) at
src/os/unix/ngx_time.c:59
#4 0x08055223 in ngx_time_update (sec=1267569890, msec=5) at
src/core/ngx_times.c:114
#5 0x0805e51a in ngx_timer_signal_handler (signo=14) at
src/event/ngx_event.c:565
#6 0x00775898 in __open_catalog () from /lib/tls/libc.so.6
#7 0x007cc043 in __tzfile_read () from /lib/tls/libc.so.6
#8 0x007cb217 in tzset_internal () from /lib/tls/libc.so.6
#9 0x007c942d in __wcsmbs_named_conv () from /lib/tls/libc.so.6
#10 0x080611c6 in ngx_localtime (s=1267569889, tm=0x875ff4) at
src/os/unix/ngx_time.c:59
#11 0x08088c9f in ngx_foo_time_str (r=0x9dcdc28, time_str=0xbfe18450,
t=-4) at ./ngx_foo_module//uti.c:111
#12 0x08087d1a in ngx_http_foo_record (r=0x9dcdc28,
request_arg=0x9dcddb0) at ./ngx_foo_module//foo.c:124
#13 0x0808788c in ngx_http_foo_handler (r=0x9dcdc28) at
./ngx_foo_module//ngx_http_foo_module.c:74
#14 0x0806d4f9 in ngx_http_core_content_phase (r=0x9dcdc28,
ph=0x9bb9938) at src/http/ngx_http_core_module.c:1262

strace shows (not on the same worker, but similar) :

futex(0x2c2eec, FUTEX_WAIT, 2, NULL)

I guess this situation is a deadlock, happens beneath localtime_r() of
glibc.
While my module is calling localtime_r, a time update signal trigged
localtime at the same time, then deadlocked. Is this correct?

Now I employ ngx_gmtime() to get formated time instead of
ngx_localtime(), works stable for weeks. Maybe ngx_localtime() should
not be used outside ngx_time_update().

Posted at Nginx Forum:

On Thu, Mar 04, 2010 at 03:14:53PM +0300, Maxim D. wrote:

#7 0x007cc043 in __tzfile_read () from /lib/tls/libc.so.6

eventport) nginx uses localtime_r() in signal handler. This isn’t
it’s not glibc’s fault.
Here is third-party module which probably uses threads:

#11 0x08088c9f in ngx_foo_time_str (r=0x9dcdc28, time_str=0xbfe18450,
t=-4) at ./ngx_foo_module//uti.c:111
#12 0x08087d1a in ngx_http_foo_record (r=0x9dcdc28,
request_arg=0x9dcddb0) at ./ngx_foo_module//foo.c:124
#13 0x0808788c in ngx_http_foo_handler (r=0x9dcdc28) at
./ngx_foo_module//ngx_http_foo_module.c:74

Now I employ ngx_gmtime() to get formated time instead of
ngx_localtime(), works stable for weeks. Maybe ngx_localtime()
should not be used outside ngx_time_update().

You should use ngx_time()/ngx_timeofday() instead in your module.


Igor S.
http://sysoev.ru/en/

Hello!

On Thu, Mar 04, 2010 at 05:09:33AM -0500, georgexsh wrote:

#0 0x007347a2 in ?? ()
#11 0x08088c9f in ngx_foo_time_str (r=0x9dcdc28, time_str=0xbfe18450, t=-4) at ./ngx_foo_module//uti.c:111
localtime_r() of glibc.
While my module is calling localtime_r, a time update signal
trigged localtime at the same time, then deadlocked. Is this
correct?

Yes, your analysis is correct. With timer_resolution set and for
event methods which does not support timers (all but kqueue,
eventport) nginx uses localtime_r() in signal handler. This isn’t
generally safe thing to do as POSIX doesn’t list it as
async-signal-safe function. Any other usage of localtime_r() in
the same process may cause deadlock.

In fact it may even deadlock by itself as soon as new
signal arrives before localtime_r() will finish execution (though
unlikely).

I’m a bit surprised that this actually happens as nginx isn’t
threaded and I see no reason for glibc to use locks. But anyway
it’s not glibc’s fault.

Now I employ ngx_gmtime() to get formated time instead of
ngx_localtime(), works stable for weeks. Maybe ngx_localtime()
should not be used outside ngx_time_update().

You should use ngx_time()/ngx_timeofday() instead in your module.

Maxim D.

Hello!

On Thu, Mar 04, 2010 at 10:20:29AM -0500, georgexsh wrote:

(r=0x9dcdc28, request_arg=0x9dcddb0) at
http://blogs.inxsasia.com/hiran/2009/07/stupid-futex-and-glibc.html
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=301511

Ok, second link suggest that recent glibc uses futex even when
there are no threads in application.

This isn’t really relevant though as underlying problem is in
nginx and should be fixed anyway. Only async-signal-safe
functions may be used in signal handlers, anything else leads to
undefined behaviour.

For now - use ngx_time() and don’t use timer_resolution unless you
have kqueue or eventport.

Maxim D.

On Thu, Mar 04, 2010 at 07:49:57PM +0300, Maxim D. wrote:

#11 0x08088c9f in ngx_foo_time_str (r=0x9dcdc28,

nginx and should be fixed anyway. Only async-signal-safe
functions may be used in signal handlers, anything else leads to
undefined behaviour.

Strickly speaking the generic signal handler calls localtime/_r() too:
ngx_time_update(0, 0);

And if timer_resolution bug can be fixed easy, nevertheless, I do not
know how to log local time in singal handler without localtime().


Igor S.
http://sysoev.ru/en/

Hello!

On Thu, Mar 04, 2010 at 10:54:56PM +0300, Igor S. wrote:

On Thu, Mar 04, 2010 at 07:49:57PM +0300, Maxim D. wrote:

[…]

This isn’t really relevant though as underlying problem is in
nginx and should be fixed anyway. Only async-signal-safe
functions may be used in signal handlers, anything else leads to
undefined behaviour.

Strickly speaking the generic signal handler calls localtime/_r() too:
ngx_time_update(0, 0);

And if timer_resolution bug can be fixed easy, nevertheless, I do not
know how to log local time in singal handler without localtime().

Well, I see 3 basic options:

  1. Handle signals in the event loop (at least kqueue supports it,
    for other methods we may use pipe() to deliver signals to event
    loop safely). This is basically how libevent does this, btw.

  2. Limit signal handlers to use only async-signal-safe functions,
    and nothing more. This means that time won’t be correct (no
    localtime()), errno (if any) won’t be translated to text (no
    strerror_r()) etc.

  3. Pretend we never talked about it and problems shouldn’t happen
    as generic signals are rare (and we already try to follow (2),
    though not completely).

Maxim D.

Igor S. Wrote:

(r=0x9dcdc28) at
./ngx_foo_module//ngx_http_foo_module.c:74

No, ngx_foo_module/* is my module which does’t use threads ethier.

ref:
http://blogs.inxsasia.com/hiran/2009/07/stupid-futex-and-glibc.html
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=301511

Posted at Nginx Forum: