Segfault - nginx 0.7.68

We’re running nginx 0.7.68 w/ the s3-proxy patch applied, and we’re
seeing
reproducible segfaults when we hit (or miss) certain files:

Nicholas-Tangs-MacBook-Air:~ ntang$ curl
http://[server]/path/to/test.txt
curl: (52) Empty reply from server

And here’s what happens on the server:

[root@server nginx]# strace -p 9742
Process 9742 attached - interrupt to quit
write(26, “2010/12/20 11:23:52 [info] 9742#”…, 84) = 84
epoll_wait(5, {{EPOLLIN, {u32=857063440, u64=139771977777168}}}, 512,
4294967295) = 1
accept(6, {sa_family=AF_INET, sin_port=htons(44473),
sin_addr=inet_addr(“myip”)}, [29640703181062160]) = 3
ioctl(3, FIONBIO, [1]) = 0
epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLET, {u32=857064145,
u64=139771977777873}}) = 0
epoll_wait(5, {{EPOLLIN, {u32=857064145, u64=139771977777873}}}, 512,
5000)
= 1
recvfrom(3, “GET /path/to/…”…, 1024, 0, NULL, NULL) = 210
stat(“/path/to/file/test.txt”, 0x7fff3b23df20) = -1 ENOENT (No such file
or
directory)
epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLOUT|EPOLLET,
{u32=857064145,
u64=139771977777873}}) = 0
— SIGSEGV (Segmentation fault) @ 0 (0) —
Process 9742 detached
[root@server nginx]#

Has anyone seen any similar behavior? We’ve got a series of location/
rewrite/ try_files blocks to look for the files. Basically, the way it
works is like this:

  • look on the main filesystem
  • look on the backup filesystem
  • look on the backup backup filesystem
  • look on S3

In the case of the request above, the process segfaulted and died after
step
1.

Normally, this is what should happen:

epoll_wait(5, {{EPOLLIN, {u32=857063440, u64=139771977777168}}}, 512,
4294967295) = 1
accept(6, {sa_family=AF_INET, sin_port=htons(46329),
sin_addr=inet_addr(“my_ip”)}, [29640703181062160]) = 3
ioctl(3, FIONBIO, [1]) = 0
epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLET, {u32=857063792,
u64=139771977777520}}) = 0
epoll_wait(5, {{EPOLLIN, {u32=857063792, u64=139771977777520}}}, 512,
5000)
= 1
recvfrom(3, “GET /path/to/request”…, 1024, 0, NULL, NULL) = 210
stat(“/path/1/test.txt”, 0x7fff3b23df20) = -1 ENOENT (No such file or
directory)
stat(“/path/2/test.txt”, 0x7fff3b23dd90) = -1 ENOENT (No such file or
directory)
stat(“/path/3/test.txt”, 0x7fff3b23dc00) = -1 ENOENT (No such file or
directory)
epoll_ctl(5, EPOLL_CTL_MOD, 3, {EPOLLIN|EPOLLOUT|EPOLLET,
{u32=857063792,
u64=139771977777520}}) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7
ioctl(7, FIONBIO, [1]) = 0
epoll_ctl(5, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLET,
{u32=857063969,
u64=139771977777697}}) = 0
connect(7, {sa_family=AF_INET, sin_port=htons(8976),
sin_addr=inet_addr(“127.0.0.1”)}, 16) = -1 EINPROGRESS (Operation now in
progress)
epoll_wait(5, {{EPOLLOUT, {u32=857063792, u64=139771977777520}},
{EPOLLOUT,
{u32=857063969, u64=139771977777697}}, {EPOLLIN, {u32=857063440,
u64=139771977777168}}}, 512, 60000) = 3
recvfrom(3, 0x7fff3b23e1f7, 1, 2, 0, 0) = -1 EAGAIN (Resource
temporarily
unavailable)
getsockopt(7, SOL_SOCKET, SO_ERROR, [3681060957225746432], [4]) = 0
writev(7, [{“GET /path/to/request”…, 203}], 1) = 203
accept(6, {sa_family=AF_INET, sin_port=htons(40347),
sin_addr=inet_addr(“127.0.0.1”)}, [51539607568]) = 8
ioctl(8, FIONBIO, [1]) = 0
epoll_ctl(5, EPOLL_CTL_ADD, 8, {EPOLLIN|EPOLLET, {u32=857064145,
u64=139771977777873}}) = 0
epoll_wait(5, {{EPOLLIN, {u32=857064145, u64=139771977777873}}}, 512,
5000)
= 1
recvfrom(8, “GET /path/to/request”…, 1024, 0, NULL, NULL) = 203
epoll_ctl(5, EPOLL_CTL_MOD, 8, {EPOLLIN|EPOLLOUT|EPOLLET,
{u32=857064145,
u64=139771977777873}}) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 9
ioctl(9, FIONBIO, [1]) = 0
epoll_ctl(5, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLOUT|EPOLLET,
{u32=857064321,
u64=139771977778049}}) = 0
connect(9, {sa_family=AF_INET, sin_port=htons(80),
sin_addr=inet_addr(“s3_ip”)}, 16) = -1 EINPROGRESS (Operation now in
progress)
epoll_wait(5, {{EPOLLOUT, {u32=857064145, u64=139771977777873}}}, 512,
60000) = 1
recvfrom(8, 0x7fff3b23e1f7, 1, 2, 0, 0) = -1 EAGAIN (Resource
temporarily
unavailable)
epoll_wait(5, {{EPOLLOUT, {u32=857064321, u64=139771977778049}}}, 512,
59999) = 1
getsockopt(9, SOL_SOCKET, SO_ERROR, [3681062469054234624], [4]) = 0
writev(9, [{“GET /path/to/request”…, 332}], 1) = 332
epoll_wait(5, {{EPOLLIN|EPOLLOUT, {u32=857064321,
u64=139771977778049}}},
512, 59998) = 1
recvfrom(9, “HTTP/1.1 200 OK\r\nx-amz-id-2: [snip]”…, 4096, 0, NULL,
NULL)
= 447
readv(9,
[{“\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0”…,
3649}], 1) = 0
close(9) = 0
writev(8, [{“HTTP/1.1 200 OK\r\nServer: nginx/0”…, 442}, {“bla\n”,
4}], 2)
= 446
write(27, “127.0.0.1 - - [20/Dec/2010:11:27”…, 190) = 190
close(8) = 0
epoll_wait(5, {{EPOLLIN|EPOLLOUT, {u32=857063969,
u64=139771977777697}}},
512, 59987) = 1
recvfrom(7, “HTTP/1.1 200 OK\r\nServer: nginx/0”…, 4096, 0, NULL,
NULL) =
446
readv(7,
[{“\0\0\300\220k\0\0\0\0\0\310\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\270\240k\0\0\0”…,
3650}], 1) = 0
close(7) = 0
writev(3, [{“HTTP/1.1 200 OK\r\nServer: nginx/0”…, 470}, {“bla\n”,
4}], 2)
= 474
write(27, “my_ip - - [20/Dec/2010:”…, 219) = 219
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
recvfrom(3, 0x69e5c0, 1024, 0, 0, 0) = -1 EAGAIN (Resource
temporarily
unavailable)
epoll_wait(5, {{EPOLLIN|EPOLLOUT, {u32=857063792,
u64=139771977777520}}},
512, 5000) = 1
recvfrom(3, “”, 1024, 0, NULL, NULL) = 0
write(26, “2010/12/20 11:27:56 [info] 10313”…, 92) = 92
close(3) = 0

Here’s how the config lines work:

Files served from Set A

location ~ ^/dv0([\d])/(.*)$ {
alias /path1/$1b/path/$2;
try_files “” /dv$1b/$2;
}

Files served from Set B

location ~ ^/dv0([\d])b/(.*)$ {
alias /path2/$1/path/$2;
try_files “” /union/$2;
}

Files served from the Union of A and B

    location ~ ^/union/path3/(.*)$ {
     alias /union/path3/$1;
     try_files "" @failUnion;
    }

location @failUnion {
rewrite ^/union/path/(.*)$ /$1 break;
proxy_pass http://127.0.0.1:8976;
}

location ~ ^/path/(.)$ {
rewrite ^/path/(.
)$ /union/path3/$1 last;
}

Proxy the request back to Amazon S3

    location / {
   proxy_pass http://bucket.s3.amazonaws.com;

proxy_s3_auth on;
proxy_s3_secure_download off; # optional if you’re also using
secdownload
proxy_s3_bucket bucket;
proxy_s3_user user;
proxy_s3_pass pass;
}

Hopefully that gives you some idea of what we’re doing without revealing
anything confidential… :wink:

Thanks,
Nicholas

Nicholas Tang*:*
VP, Dev Ops

[email protected]
|
t: +1 (646) 495 9707
|
m: +1 (347) 410 6066
|
111 8th Avenue, Floor 15, New York, NY 10011
[image: www.livestream.com] http://www.livestream.com/

Hi,

We’re running nginx 0.7.68 w/ the s3-proxy patch applied, and we’re seeing
reproducible segfaults when we hit (or miss) certain files:

Can you reproduce them without s3-proxy patch?

Best regards,
Piotr S. < [email protected] >

So what I tried was separating it into two copies of nginx (both with
the S3
patch): one that handles all of the local access, and that in the case
where
the file doesn’t exist locally and needs to get sent to S3, it proxies
it to
the 2nd copy of nginx. That copy has the S3 forwarding, and with this
setup, now it seems to work ok.

The next issue I have is that nginx seems to block and freeze on
filesystem
access if there are problems - we had a problem w/ one network mount and
even w/ timeouts specified:

sendfile on;
tcp_nopush on;
tcp_nodelay on;

client_body_timeout 5;
client_header_timeout 5;
send_timeout 1;

keepalive_timeout 0;

keepalive_timeout 75 20;

keepalive_timeout 5 5;

We still had nginx hang for a long time on an access to a filesystem
that
was having issues. (We’re using glusterfs to mount remote volumes from
our
storage servers.) So we put in gluster timeouts, and that seems to time
out
the gluster volumes, but nginx still remains locked up for an
unacceptable
amount of time.

Is there a way to have nginx timeout the request (return a 503 or
something,
for instance) if it doesn’t get a response from the filesystem in x
seconds?
Ideally, we’d like it to time out after just a second or two.

What I’m looking at now is having Varnish in front of nginx and seeing
if I
can get that timeout to work, but obviously this would be a lot
cleaner
with fewer layers. :slight_smile:

Thanks,
Nicholas

Nicholas Tang*:*
VP, Dev Ops

[email protected]
|
t: +1 (646) 495 9707
|
m: +1 (347) 410 6066
|
111 8th Avenue, Floor 15, New York, NY 10011
[image: www.livestream.com] http://www.livestream.com/