Forum: NGINX call ngx_localtime may cause deadlock?

Posted by georgexsh (Guest)
on 2010-03-04 11:10
(Received via mailing list)
I am writing a module based on nginx 0.7.65, ngx_localtime() is called 
in the module.
Problem is, if set "timer_resolution  100ms;" in config file, nginx 
workers(*4) will hangs randomly. then if "timer_resolution" command was 
turned off , everything works just fine.

here is the backtrace (coredump generated with gcore(1) on a hanged 
worker process):

#0  0x007347a2 in ?? ()
#1  0x00823a9e in pthread_cond_init@@GLIBC_2.3.2 () from 
/lib/tls/libc.so.6
#2  0x007c942d in __wcsmbs_named_conv () from /lib/tls/libc.so.6
#3  0x080611c6 in ngx_localtime (s=1267569890, tm=0x9ba2048) at 
src/os/unix/ngx_time.c:59
#4  0x08055223 in ngx_time_update (sec=1267569890, msec=5) at 
src/core/ngx_times.c:114
#5  0x0805e51a in ngx_timer_signal_handler (signo=14) at 
src/event/ngx_event.c:565
#6  0x00775898 in __open_catalog () from /lib/tls/libc.so.6
#7  0x007cc043 in __tzfile_read () from /lib/tls/libc.so.6
#8  0x007cb217 in tzset_internal () from /lib/tls/libc.so.6
#9  0x007c942d in __wcsmbs_named_conv () from /lib/tls/libc.so.6
#10 0x080611c6 in ngx_localtime (s=1267569889, tm=0x875ff4) at 
src/os/unix/ngx_time.c:59
#11 0x08088c9f in ngx_foo_time_str (r=0x9dcdc28, time_str=0xbfe18450, 
t=-4) at ./ngx_foo_module//uti.c:111
#12 0x08087d1a in ngx_http_foo_record (r=0x9dcdc28, 
request_arg=0x9dcddb0) at ./ngx_foo_module//foo.c:124
#13 0x0808788c in ngx_http_foo_handler (r=0x9dcdc28) at 
./ngx_foo_module//ngx_http_foo_module.c:74
#14 0x0806d4f9 in ngx_http_core_content_phase (r=0x9dcdc28, 
ph=0x9bb9938) at src/http/ngx_http_core_module.c:1262


strace shows  (not on the same worker, but similar) :

futex(0x2c2eec, FUTEX_WAIT, 2, NULL)


I guess this situation is a deadlock, happens beneath localtime_r() of 
glibc.
While my module is calling localtime_r, a time update signal trigged 
localtime at the same time, then deadlocked. Is this correct?

Now I employ ngx_gmtime() to get formated time instead of 
ngx_localtime(), works stable for weeks. Maybe ngx_localtime() should 
not be used outside ngx_time_update().

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,59851,59851#msg-59851
Posted by Maxim Dounin (Guest)
on 2010-03-04 13:15
(Received via mailing list)
Hello!

On Thu, Mar 04, 2010 at 05:09:33AM -0500, georgexsh wrote:

> #0  0x007347a2 in ?? ()
> #11 0x08088c9f in ngx_foo_time_str (r=0x9dcdc28, time_str=0xbfe18450, t=-4) at ./ngx_foo_module//uti.c:111
> localtime_r() of glibc.
> While my module is calling localtime_r, a time update signal 
> trigged localtime at the same time, then deadlocked. Is this 
> correct?

Yes, your analysis is correct.  With timer_resolution set and for
event methods which does not support timers (all but kqueue,
eventport) nginx uses localtime_r() in signal handler.  This isn't
generally safe thing to do as POSIX doesn't list it as
async-signal-safe function.  Any other usage of localtime_r() in
the same process may cause deadlock.

In fact it may even deadlock by itself as soon as new
signal arrives before localtime_r() will finish execution (though
unlikely).

I'm a bit surprised that this actually happens as nginx isn't
threaded and I see no reason for glibc to use locks.  But anyway
it's not glibc's fault.

> Now I employ ngx_gmtime() to get formated time instead of 
> ngx_localtime(), works stable for weeks. Maybe ngx_localtime() 
> should not be used outside ngx_time_update().

You should use ngx_time()/ngx_timeofday() instead in your module.

Maxim Dounin
Posted by Igor Sysoev (Guest)
on 2010-03-04 13:22
(Received via mailing list)
On Thu, Mar 04, 2010 at 03:14:53PM +0300, Maxim Dounin wrote:

> > 
> > #7  0x007cc043 in __tzfile_read () from /lib/tls/libc.so.6
> > 
> eventport) nginx uses localtime_r() in signal handler.  This isn't
> it's not glibc's fault.
Here is third-party module which probably uses threads:

#11 0x08088c9f in ngx_foo_time_str (r=0x9dcdc28, time_str=0xbfe18450, 
t=-4) at ./ngx_foo_module//uti.c:111
#12 0x08087d1a in ngx_http_foo_record (r=0x9dcdc28, 
request_arg=0x9dcddb0) at ./ngx_foo_module//foo.c:124
#13 0x0808788c in ngx_http_foo_handler (r=0x9dcdc28) at 
./ngx_foo_module//ngx_http_foo_module.c:74

> > Now I employ ngx_gmtime() to get formated time instead of 
> > ngx_localtime(), works stable for weeks. Maybe ngx_localtime() 
> > should not be used outside ngx_time_update().
> 
> You should use ngx_time()/ngx_timeofday() instead in your module.


--
Igor Sysoev
http://sysoev.ru/en/
Posted by georgexsh (Guest)
on 2010-03-04 16:20
(Received via mailing list)
Igor Sysoev Wrote:
-------------------------------------------------------

> (r=0x9dcdc28) at
> ./ngx_foo_module//ngx_http_foo_module.c:74
> 


No, ngx_foo_module/* is my module which does't use threads ethier.

ref:
http://blogs.inxsasia.com/hiran/2009/07/stupid-futex-and-glibc.html
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=301511

Posted at Nginx Forum: 
http://forum.nginx.org/read.php?2,59851,59942#msg-59942
Posted by Maxim Dounin (Guest)
on 2010-03-04 17:50
(Received via mailing list)
Hello!

On Thu, Mar 04, 2010 at 10:20:29AM -0500, georgexsh wrote:

> > (r=0x9dcdc28, request_arg=0x9dcddb0) at
> http://blogs.inxsasia.com/hiran/2009/07/stupid-futex-and-glibc.html
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=301511

Ok, second link suggest that recent glibc uses futex even when
there are no threads in application.

This isn't really relevant though as underlying problem is in
nginx and should be fixed anyway.  Only async-signal-safe
functions may be used in signal handlers, anything else leads to
undefined behaviour.

For now - use ngx_time() and don't use timer_resolution unless you
have kqueue or eventport.

Maxim Dounin
Posted by Igor Sysoev (Guest)
on 2010-03-04 20:55
(Received via mailing list)
On Thu, Mar 04, 2010 at 07:49:57PM +0300, Maxim Dounin wrote:

> > > #11 0x08088c9f in ngx_foo_time_str (r=0x9dcdc28,
> > 
> nginx and should be fixed anyway.  Only async-signal-safe 
> functions may be used in signal handlers, anything else leads to 
> undefined behaviour.

Strickly speaking the generic signal handler calls localtime/_r() too:
    ngx_time_update(0, 0);

And if timer_resolution bug can be fixed easy, nevertheless, I do not
know how to log local time in singal handler without localtime().


--
Igor Sysoev
http://sysoev.ru/en/
Posted by Maxim Dounin (Guest)
on 2010-03-05 01:59
(Received via mailing list)
Hello!

On Thu, Mar 04, 2010 at 10:54:56PM +0300, Igor Sysoev wrote:

> On Thu, Mar 04, 2010 at 07:49:57PM +0300, Maxim Dounin wrote:

[...]

> > This isn't really relevant though as underlying problem is in 
> > nginx and should be fixed anyway.  Only async-signal-safe 
> > functions may be used in signal handlers, anything else leads to 
> > undefined behaviour.
> 
> Strickly speaking the generic signal handler calls localtime/_r() too:
>     ngx_time_update(0, 0);
> 
> And if timer_resolution bug can be fixed easy, nevertheless, I do not
> know how to log local time in singal handler without localtime().

Well, I see 3 basic options:

1. Handle signals in the event loop (at least kqueue supports it,
for other methods we may use pipe() to deliver signals to event
loop safely).  This is basically how libevent does this, btw.

2. Limit signal handlers to use only async-signal-safe functions,
and nothing more.  This means that time won't be correct (no
localtime()), errno (if any) won't be translated to text (no
strerror_r()) etc.

3. Pretend we never talked about it and problems shouldn't happen
as generic signals are rare (and we already try to follow (2),
though not completely).

Maxim Dounin
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.