"bus error" on Linux Sparc

Hi everyone

When I try to start nginx-0.8.14 on a Sparc-Linux I get a “Bus Error”:

nginx -c /etc/nginx/nginx.conf

Bus error

With ‘strace’ I was able to track it down a little:
[…]
open(“/etc/nginx/nginx.conf”, O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=1610, …}) = 0
pread(4, Bus error

Now, according to that I think the error might be in ngx_read_file where
a u_char* is passed as a void* as second argument for pread, which in
turn is probably a long and should therefore be aligned on sparc.
Even if that is the actual problem I have no idea how to fix it. Any
ideas?

Cheers,
Tiziano

Posted at Nginx Forum:

On Tue, Sep 08, 2009 at 10:34:58AM -0400, dev-zero wrote:

pread(4, Bus error

Now, according to that I think the error might be in ngx_read_file where a u_char* is passed as a void* as second argument for pread, which in turn is probably a long and should therefore be aligned on sparc.
Even if that is the actual problem I have no idea how to fix it. Any ideas?

Could you create coredump and run

gdb /path/to/nginx /path/to/core
bt

?

Igor S. Wrote:

With ‘strace’ I was able to track it down a
void* as second argument for pread, which in turn
is probably a long and should therefore be aligned
on sparc.
Even if that is the actual problem I have no
idea how to fix it. Any ideas?

Could you create coredump and run

gdb /path/to/nginx /path/to/core
bt

Sure, here we go…

The first bug is a segfault I’ve been experiencing a lot when doing a
config check while the server is already running:

Core was generated by `/usr/sbin/nginx -c /etc/nginx/nginx.conf -t’.
Program terminated with signal 11, Segmentation fault.

#0 ngx_hash_add_key (ha=0xff82b270, key=0xf7ba5a4c, value=0xc960,
flags=75334) at src/core/ngx_hash.c:814
814 *name = *key;
#1 0x00000008 in ?? ()
(gdb)

And the coredump+gdb-bt from the bus error mentioned before:

Core was generated by `nginx -c /etc/nginx/nginx.conf -t’.
Program terminated with signal 10, Bus error.

#0 ngx_palloc (pool=0xa1230, size=784) at src/core/ngx_palloc.c:126
126 m = ngx_align_ptr(p->d.last, NGX_ALIGNMENT);
(gdb) bt
#0 ngx_palloc (pool=0xa1230, size=784) at src/core/ngx_palloc.c:126
#1 0x0003c608 in ngx_http_core_create_srv_conf (cf=0xffa87280) at
src/core/ngx_array.h:43
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

And the original C/LDFLAGS I’ve been using to compile nginx:
CFLAGS=“-O2 -mcpu=ultrasparc -pipe -ggdb”
LDFLAGS=“-Wl,-O2,–hash-style=gnu,–sort-common,–as-needed”
Dropping the LDFLAGS didn’t help, also didn’t reducing -O2 to -O1 in
CFLAGS.
GCC version is: 4.3.2, glibc: 2.9_p20081201, kernel: 2.6.31-rc9

Thanks in advance for your help,
Cheers,
Tiziano

Posted at Nginx Forum:

On Wed, Sep 09, 2009 at 05:27:47AM -0400, dev-zero wrote:

Bus error
Now, according to that I think the error might
bt
814 *name = *key;
(gdb) bt
#0 ngx_palloc (pool=0xa1230, size=784) at src/core/ngx_palloc.c:126
#1 0x0003c608 in ngx_http_core_create_srv_conf (cf=0xffa87280) at src/core/ngx_array.h:43
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

And the original C/LDFLAGS I’ve been using to compile nginx:
CFLAGS="-O2 -mcpu=ultrasparc -pipe -ggdb"
LDFLAGS="-Wl,-O2,–hash-style=gnu,–sort-common,–as-needed"
Dropping the LDFLAGS didn’t help, also didn’t reducing -O2 to -O1 in CFLAGS.
GCC version is: 4.3.2, glibc: 2.9_p20081201, kernel: 2.6.31-rc9

Could you rebuild nginx with CFLAGS="-DNGX_ALIGNMENT=16 …"

Igor S. Wrote:

open(“/etc/nginx/nginx.conf”,
void* as second argument for pread, which in
bt
Program terminated with signal 11, Segmentation
mentioned before:
#0 ngx_palloc (pool=0xa1230, size=784) at

LDFLAGS=“-Wl,-O2,–hash-style=gnu,–sort-common,–
as-needed”

Dropping the LDFLAGS didn’t help, also didn’t
reducing -O2 to -O1 in CFLAGS.
GCC version is: 4.3.2, glibc: 2.9_p20081201,
kernel: 2.6.31-rc9

Could you rebuild nginx with
CFLAGS=“-DNGX_ALIGNMENT=16 …”

Done. Did not help, same result:

nginx -c /etc/nginx/nginx.conf -t

Bus error (core dumped)

Posted at Nginx Forum:

On Wed, Sep 09, 2009 at 06:25:15AM -0400, dev-zero wrote:

dev-zero

little:
be in ngx_read_file where a u_char* is passed

Core was generated by `/usr/sbin/nginx -c

NGX_ALIGNMENT);
compile nginx:
CFLAGS="-DNGX_ALIGNMENT=16 …"

Done. Did not help, same result:

nginx -c /etc/nginx/nginx.conf -t

Bus error (core dumped)

If the bus error is at src/core/ngx_palloc.c:126 again, then:

bt
p *p

Igor S. Wrote:

O_RDONLY|O_LARGEFILE) = 4
void* as second argument for pread, which

#1 0x00000008 in ?? ()
#0 ngx_palloc (pool=0xa1230, size=784) at
this
as-needed"

nginx -c /etc/nginx/nginx.conf -t

Bus error (core dumped)

If the bus error is at src/core/ngx_palloc.c:126
again, then:

bt
p *p

To make sure it isn’t my CFLAGS or LDFLAGS I built it from the a freshly
unpacked tarball with the following options (btw, not all options seem
to trigger the bug equally often):

CFLAGS=“-DNGX_ALIGNMENT=16 -mcpu=v9 -O -pipe -ggdb” ./configure

–prefix=/usr --conf-path=/etc/nginx/nginx.conf
–http-log-path=/var/log/nginx/access_log
–error-log-path=/var/log/nginx/error_log --pid-path=/var/run/nginx.pid
–http-client-body-temp-path=/var/tmp/nginx/client
–http-proxy-temp-path=/var/tmp/nginx/proxy
–http-fastcgi-temp-path=/var/tmp/nginx/fastcgi --with-md5-asm
–with-sha1-asm --with-rtsig_module --with-select_module
–with-poll_module --with-file-aio --with-ipv6 --with-http_ssl_module
–without-pcre --with-http_addition_module --with-http_dav_module
–with-http_gzip_static_module --without-http_ssi_module
–without-http_userid_module --without-http_geo_module
–without-http_map_module --without-http_referer_module
–without-http_rewrite_module --without-http_proxy_module
–without-http_memcached_module --without-http_limit_zone_module
–without-http_limit_req_module --without-http_upstream_ip_hash_module
[…]

make

[…]

./objs/nginx -c /etc/nginx/nginx.conf -t

Bus error (core dumped)

gdb ./objs/nginx core

GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show
copying”
and “show warranty” for details.
This GDB was configured as “sparc-unknown-linux-gnu”…
Reading symbols from /lib/libcrypt.so.1…done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libssl.so.0.9.8…done.
Loaded symbols for /usr/lib/libssl.so.0.9.8
Reading symbols from /usr/lib/libcrypto.so.0.9.8…done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.8
Reading symbols from /lib/libdl.so.2…done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libz.so.1…done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib/libc.so.6…done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2…done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_compat.so.2…done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1…done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2…done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2…done.
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `./objs/nginx -c /etc/nginx/nginx.conf -t’.
Program terminated with signal 10, Bus error.

#0 ngx_palloc (pool=0xa7ff8, size=0) at src/core/ngx_palloc.c:126
126 m = ngx_align_ptr(p->d.last, NGX_ALIGNMENT);
(gdb) p *p
Cannot access memory at address 0x61746f6d
(gdb)

Maybe this gives another hint:

./objs/nginx -c /etc/nginx/nginx.conf -t

Bus error (core dumped)

./objs/nginx -c /tmp/nginx.conf -t

the configuration file /tmp/nginx.conf syntax is ok
: 8192 worker_connections are more than open file resource limit: 1024
configuration file /tmp/nginx.conf test is successful

./objs/nginx -t

the configuration file /etc/nginx/nginx.conf syntax is ok
: 8192 worker_connections are more than open file resource limit: 1024
configuration file /etc/nginx/nginx.conf test is successful

For the same configuration file.

Posted at Nginx Forum:

Hey

Did you get anywhere with this issue? I am experiencing it as well with
a new web node I’m trying to set up for iusethis.com, on a Sun T1000
running Debian.

Marcus

Posted at Nginx Forum:

On Sat, Sep 26, 2009 at 08:38:36PM -0400, marcusramberg wrote:

Hey

Did you get anywhere with this issue? I am experiencing it as well with a new web node I’m trying to set up for iusethis.com, on a Sun T1000 running Debian.

The bug has happened some time before the “bus error” occurs.
It’s not easy to find the cuase by gdb back trace in this case.
If anyone can give me access to Sparc Debian box where this error can
be reproduced I will fix it much more quickly.

On Tue, Sep 29, 2009 at 12:05:37PM +0400, Igor S. wrote:

On Sat, Sep 26, 2009 at 08:38:36PM -0400, marcusramberg wrote:

Hey

Did you get anywhere with this issue? I am experiencing it as well with a new web node I’m trying to set up for iusethis.com, on a Sun T1000 running Debian.

The bug has happened some time before the “bus error” occurs.
It’s not easy to find the cuase by gdb back trace in this case.
If anyone can give me access to Sparc Debian box where this error can
be reproduced I will fix it much more quickly.

The attached patch should fix the bug.

Igor S. Wrote:

O_RDONLY|O_LARGEFILE) = 4
void* as second argument for pread, which

#1 0x00000008 in ?? ()
#0 ngx_palloc (pool=0xa1230, size=784) at
this
as-needed"

nginx -c /etc/nginx/nginx.conf -t

Bus error (core dumped)

If the bus error is at src/core/ngx_palloc.c:126
again, then:

bt
p *p

And with running gdb directly on nginx I was finally able to get a
useful backtrace:
rogue nginx-0.8.14 # gdb ./objs/nginx
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show
copying”
and “show warranty” for details.
This GDB was configured as “sparc-unknown-linux-gnu”…
(gdb) set follow-fork-mode child
(gdb) run -c /tmp/nginx.conf -t
Starting program: /root/nginx-0.8.14/objs/nginx -c /tmp/nginx.conf -t

Program received signal SIGBUS, Bus error.
0x00017a88 in ngx_palloc (pool=0xf9240, size=784) at
src/core/ngx_palloc.c:126
126 m = ngx_align_ptr(p->d.last, NGX_ALIGNMENT);
(gdb) bt
#0 0x00017a88 in ngx_palloc (pool=0xf9240, size=784) at
src/core/ngx_palloc.c:126
#1 0x000679dc in ngx_array_init (array=0xfab30, pool=0xf9240, n=4,
size=196) at src/core/ngx_array.h:43
#2 0x00067ba4 in ngx_http_core_create_srv_conf (cf=0xff8991f8) at
src/http/ngx_http_core_module.c:2799
#3 0x00066880 in ngx_http_core_server (cf=0xff8991f8, cmd=0xda138,
dummy=0xf5db0) at src/http/ngx_http_core_module.c:2344
#4 0x000332a4 in ngx_conf_handler (cf=0xff8991f8, last=1) at
src/core/ngx_conf_file.c:393
#5 0x00032d14 in ngx_conf_parse (cf=0xff8991f8, filename=0x0) at
src/core/ngx_conf_file.c:243
#6 0x0005e008 in ngx_http_block (cf=0xff8991f8, cmd=0xd9e80,
conf=0xf59b8) at src/http/ngx_http.c:241
#7 0x000332a4 in ngx_conf_handler (cf=0xff8991f8, last=1) at
src/core/ngx_conf_file.c:393
#8 0x00032d14 in ngx_conf_parse (cf=0xff8991f8, filename=0xf5308) at
src/core/ngx_conf_file.c:243
#9 0x0002ee4c in ngx_init_cycle (old_cycle=0xff8992f0) at
src/core/ngx_cycle.c:262
#10 0x00013edc in main (argc=4, argv=0xff8994d4) at src/core/nginx.c:317
(gdb) p *p
Cannot access memory at address 0x782d6a61
(gdb)

Posted at Nginx Forum:

On Wed, Sep 30, 2009 at 12:12:26AM +0400, Igor S. wrote:

If anyone can give me access to Sparc Debian box where this error can
be reproduced I will fix it much more quickly.

The attached patch should fix the bug.

The updated patch.

On 30.09.2009, at 8:46, Igor S. [email protected] wrote:

on a Sun T1000 running Debian.

The bug has happened some time before the “bus error” occurs.
It’s not easy to find the cuase by gdb back trace in this case.
If anyone can give me access to Sparc Debian box where this error
can
be reproduced I will fix it much more quickly.
Is there a way to reproduce this type of error running just an
emulator? If it is, I can with a plesure set up continuos testing
cycle based on the test set by Maxim D…

There is TestSwarm solving this problem for JavaScript developers,
and it works great :slight_smile: