Hangup in epoll

Hello, with recent versions of nginx (0.8.47 - 0.8.49) we had hangups
of all nginx processes, which could even not killed. This happend
during low-traffic hours. In the kernel-logs we found:

Sep 7 05:02:00 www06 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000020
Sep 7 05:02:00 www06 kernel: printing eip:
Sep 7 05:02:00 www06 kernel: c0256a3d
Sep 7 05:02:00 www06 kernel: *pde = 32919001
Sep 7 05:02:00 www06 kernel: Oops: 0000 [#1]
Sep 7 05:02:00 www06 kernel: SMP
Sep 7 05:02:00 www06 kernel: last sysfs file:
/devices/pci0000:00/0000:00:0a.0/0000:02:02.1/irq
Sep 7 05:02:00 www06 kernel: Modules linked in: nls_utf8 nfs lockd
nfs_acl sunrpc cpufreq_ondemand cpufreq_userspace ipt_TCPMSS
cpufreq_powersave xt_limit xt_tcpudp xt_state powernow_k8 ipt_LOG
ipt_recent freq_table iptable_nat ip_na
t ip_conntrack nfnetlink iptable_filter ip_tables x_tables ipv6 dock
button battery ac xfs_quota xfs loop dm_mod i2c_amd756 ide_cd ohci_hcd
ehci_hcd cdrom i2c_core mptctl hw_random usbcore tg3 ext3 jbd edd fan
thermal processor sg mpt
spi mptscsih mptbase scsi_transport_spi amd74xx sd_mod scsi_mod
ide_disk ide_core
Sep 7 05:02:00 www06 kernel: CPU: 1
Sep 7 05:02:00 www06 kernel: EIP: 0060:[] Not tainted
VLI
Sep 7 05:02:00 www06 kernel: EFLAGS: 00210246
(2.6.16.60-0.66.1-bigsmp #1)
Sep 7 05:02:00 www06 kernel: EIP is at sock_poll+0x9/0xe
Sep 7 05:02:00 www06 kernel: eax: f447f980 ebx: 00000000 ecx:
00000000 edx: e936f900
Sep 7 05:02:00 www06 kernel: esi: cd8f9e2c edi: f5cb4540 ebp:
cd8f9e00 esp: e8b0ff60
Sep 7 05:02:00 www06 kernel: ds: 007b es: 007b ss: 0068
Sep 7 05:02:00 www06 kernel: Process nginx (pid: 17585,
threadinfo=e8b0e000 task=c968ed10)
Sep 7 05:02:00 www06 kernel: Stack: <0>00000200 c018ced6 083d1b88
00000000 defa3080 7fffffff f10cd21c c94fe5c0
Sep 7 05:02:00 www06 kernel: 00000000 00000000 c0181a69
ccbd5380 f45f3740 c0167047 e8b0ff98 e8b0ff98
Sep 7 05:02:00 www06 kernel: f5cb4550 f5cb4550 00000008
ffffffff 00000003 e8b0e000 c0103dcb 00000008
Sep 7 05:02:00 www06 kernel: Call Trace:
Sep 7 05:02:00 www06 kernel: [] sys_epoll_wait+0x246/0x3f8
Sep 7 05:02:00 www06 kernel: [] mntput_no_expire+0x13/0x76
Sep 7 05:02:00 www06 kernel: [] filp_close+0x4e/0x54
Sep 7 05:02:00 www06 kernel: [] sysenter_past_esp+0x54/0x79
Sep 7 05:02:00 www06 kernel: Code: d8 89 43 10 0f 20 e0 89 43 14 5b
c3 b8 20 53 42 c0 e9 bd ff ff ff b8 01 00 00 00 c3 b8 fa ff ff ff c3
53 89 d1 8b 50 78 8b 5a 08 53 20 5b c3 53 89 d1 8b 50 78 8b 5a 08
ff 53 40 5b c3 53 8b

The system is SUSE Linux Enterprise Server 10 (i586),
Kernel Linux version 2.6.16.60-0.66.1-bigsmp (geeko@buildhost) (gcc
version 4.1.2 20070115 (SUSE Linux)) #1 SMP Fri May 28 12:10:21 UTC
2010
NGinx-version 0.8.49
We now switched to event-modul poll for the time being as this is a
low traffic site.

The hangup happened two times before, but very sporadic and could not
be correlated to specific requests. Any more help we could provide? Oh
and if someone has a hint how we can prevent the full reboot of the
machine in such situations would also be appreciated.

With regards,

__Janko H.

On Tue, Sep 07, 2010 at 10:25:30AM +0200, jhauser wrote:

Sep 7 05:02:00 www06 kernel: Oops: 0000 [#1]
thermal processor sg mpt
Sep 7 05:02:00 www06 kernel: ds: 007b es: 007b ss: 0068
Sep 7 05:02:00 www06 kernel: [] mntput_no_expire+0x13/0x76
2010
NGinx-version 0.8.49
We now switched to event-modul poll for the time being as this is a
low traffic site.

The hangup happened two times before, but very sporadic and could not
be correlated to specific requests. Any more help we could provide? Oh
and if someone has a hint how we can prevent the full reboot of the
machine in such situations would also be appreciated.

As I understand this is not “hungup of all nginx processes, which could
even not killed”, but a kernel crash. This is not nginx issue, this is
Linux kernel bug.


Igor S.
http://sysoev.ru/en/