Forum: NGINX Floating Point exception w/ down server in upstream

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
2974d09ac2541e892966b762aad84943?d=identicon&s=25 JacobSingh (Guest)
on 2009-04-30 12:38
(Received via mailing list)
Hello,

I've been using nginx pretty successfully for awhile, and this problem
started recently.  I've got CentOS 5 and the nginx from yum
(nginx/0.6.32).

Here is my config:

 upstream search1.us_seach  {

        server    slave1.search1.xxxxxx.us:8080  weight=3 max_fails=40
fail_timeout=20s;

        server   master.search1.xxxxxx.us:8080 weight=1 max_fails=0;
        }

When the master server is down, instead of failing over to the slave
server (as expected).  I get this:

2009/04/30 02:23:34  15116#0: *99 connect() failed (111: Connection
refused) while connecting to upstream, client: 127.0.0.1, server: _,
request: "GET /mypath/to/something HTTP/1.1", upstream:
"http://11.111.111.11:8080//mypath/to/something", host: "localhost:81"
2009/04/30 02:23:34  13227#0: signal 17 (SIGCHLD) received
2009/04/30 02:23:34  13227#0: worker process 15116 exited on signal 8
2009/04/30 02:23:34  13227#0: start worker process 16779

Where (11.111.111.11 == master.search1.xxxxxx.us).

It never even tries the slave server because it just bails right there.

I made an strace of the problem, it is available here:
http://pastebin.ca/1408224

Here is the very end of it:

#
gettimeofday({1241072060, 198274}, NULL) = 0
#
getsockopt(20, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
#
write(9, "2009/04/30 02:14:20  1322"..., 335) = 335
#
--- SIGFPE (Floating point exception) @ 0 (0) ---

I tried compiling 0.6.7.  The problem remains, but is slightly better.
Now, it will actually try the slave server first sometimes, but it will
still die every time it tries the master, not falling over to the slave.

Thank you!
Jacob

Posted at Nginx Forum:
http://forum.nginx.org/read.php?2,1581,1581#msg-1581
5640e332954fc0006aea97a155ce0afd?d=identicon&s=25 Igor Sysoev (Guest)
on 2009-04-30 13:28
(Received via mailing list)
On Thu, Apr 30, 2009 at 06:28:16AM -0400, JacobSingh wrote:

>         server   master.search1.xxxxxx.us:8080 weight=1 max_fails=0;
>
> getsockopt(20, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
> #
> write(9, "2009/04/30 02:14:20  1322"..., 335) = 335
> #
> --- SIGFPE (Floating point exception) @ 0 (0) ---
>
> I tried compiling 0.6.7.  The problem remains, but is slightly better.  Now, it will 
actually try the slave server first sometimes, but it will still die every time it tries 
the master, not falling over to the slave.

The problme due to

        max_fails=0;

It was fixed in 0.6.33:

Changes with nginx 0.6.33                                        20 Nov
2008

    *) Bugfix: if the "max_fails=0" parameter was used in upstream with
       several servers, then a worker process exited on a SIGFPE signal.
       Thanks to Maxim Dounin.
This topic is locked and can not be replied to.