Sudden nginx hang -- restart fails, "98: Address already in use"

musicdenotation · February 5, 2014, 7:53pm

Hello, all.

About an hour ago, out of the blue, my server stopped responding to
webpage
requests. We are using nginx + php-fpm on RHEL6.

service nginx status

nginx (pid 31600) is running…

service nginx restart

Stopping nginx: [FAILED]
Starting nginx: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address
already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
nginx: [emerg] still could not bind()

I killed the process and was able to restart nginx so the immediate
crisis
is over, but I need to know: What the hell happened? What would cause
nginx
to hang like this? I have googled around and I see several discussions
about
what to do when this happens but zilch about how to keep it from
happening.

Dave

Posted at Nginx Forum:

dwirth · February 5, 2014, 8:08pm

On 5 February 2014 18:53, dwirth [email protected] wrote:

Starting nginx: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address
nginx: [emerg] still could not bind()

I killed the process and was able to restart nginx so the immediate crisis
is over, but I need to know: What the hell happened? What would cause nginx
to hang like this? I have googled around and I see several discussions about
what to do when this happens but zilch about how to keep it from happening.

The underlying cause I can’t help with but, in this situation, I’d
always do a separate stop/stop so I could ensure the service had
stopped before starting it again. It hadn’t here, and that’s what
caused your “Address already in use” error messages.

J

dwirth · February 6, 2014, 1:11pm

Hello!

On Wed, Feb 05, 2014 at 01:53:14PM -0500, dwirth wrote:

Starting nginx: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address
nginx: [emerg] still could not bind()

I killed the process and was able to restart nginx so the immediate crisis
is over, but I need to know: What the hell happened? What would cause nginx
to hang like this? I have googled around and I see several discussions about
what to do when this happens but zilch about how to keep it from happening.

Such hangs can be caused either by bugs (either in nginx itself,
or in 3rd party modules; take a look at nginx -V to find out how
many 3rd party modules you have) or by some serious blocking at
OS level. E.g., serving files from an NFS share may easily result
in such a hang if something happens with the NFS server.

It is impossible to say what happened in your case without
additional information (at least “ps alx” output whould be
helpful, and see also Debugging | NGINX). General
recomendations are:

Make sure your nginx is up-to-date. Note that some linux
distros ship quite outdated versions in their repositories, make
sure to check version against nginx.org. Current versions are
1.5.10 (mainline) and 1.4.4 (stable).
Make sure you aren’t using things that can easily block, like
NFS or other network filesystems, or some blocking code in
embedded languages like embedded perl or lua. Or, if you do use
them, expect anything to die if something bad happens.
If you are using 3rd party modules, make sure you have good
reasons to do so.
Examine logs for “crit”, “alert”, “emerge” messages. If there
are any, they require investigation, especially messages about
worker processes “exited on signal”.

–
Maxim D.
http://nginx.org/

dwirth · February 6, 2014, 3:24pm

Hello!

On Thu, Feb 06, 2014 at 07:23:51AM -0500, dwirth wrote:

Thanks. I am fairly certain (?) at this point that NFS is the culprit. I had
a lot of trouble unmounting one of my NFS directories. Eventually I resorted
to rebooting, at which point it went into a permanent hang until a reboot
was forced via hypervisor.

Well, if you use NFS it perfectly explains observed behaviour.

Is this particular situation, where NFS causes nginx to shut down, specific
to nginx? We just switched from apache to nginx at the start of the year. I
didn’t have NFS problems before that. I don’t know if that’s coincidence or
not.

Basic NFS problems are the same regardless of software you use: if
something goes wrong, it blocks processes trying to access NFS
share. With nginx, results are usually a bit more severe than
with process-based servers like Apache, because a) blocking an
nginx worker process affects multiple requests, and b) blocking
all nginx processes is easier as typically there are only small
number of nginx worker processes.

At any rate, my takeaway from this: nginx + NFS = bad.

I would take nginx out of the equation.

If you are going to use NFS it may be a good idea to make sure
you’ve read and understood all the mount options NFS has. In
particular, it is believed that using the “soft” option and small
timeouts may help a bit. I wouldn’t recommend using NFS at all
though.

–
Maxim D.
http://nginx.org/

dwirth · February 6, 2014, 1:24pm

Thanks. I am fairly certain (?) at this point that NFS is the culprit. I
had
a lot of trouble unmounting one of my NFS directories. Eventually I
resorted
to rebooting, at which point it went into a permanent hang until a
reboot
was forced via hypervisor.

Is this particular situation, where NFS causes nginx to shut down,
specific
to nginx? We just switched from apache to nginx at the start of the
year. I
didn’t have NFS problems before that. I don’t know if that’s coincidence
or
not.

At any rate, my takeaway from this: nginx + NFS = bad.

Posted at Nginx Forum: