Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Crash on -current in pool_drain()
On Sun, 18 Oct 2015, Nick Hudson wrote:
On 10/18/15 00:30, Paul Goyette wrote:
Under heavy load, and after several hours of building packages, I am
seeing the following crash. I'm doing a bisect to narrow down more,
but it has been happening at least a week ago, with kernel and all
modules build from sources updated on 2015-10-13 at 08:30:00 UTC.
(This is on amd64)
Here's the backtrace from gdb:
[snip]
#8 0xffffffff80333415 in pool_drain (ppp=ppp@entry=0xfffffe810f528e30)
at /build/netbsd-local/src/sys/kern/subr_pool.c:1429
#9 0xffffffff802d1791 in uvm_pageout (arg=<optimized out>)
at /build/netbsd-local/src/sys/uvm/uvm_pdaemon.c:343
#10 0xffffffff80100807 in lwp_trampoline ()
#11 0x0000000000000000 in ?? ()
(gdb) fr 8
#8 0xffffffff80333415 in pool_drain (ppp=ppp@entry=0xfffffe810f528e30)
at /build/netbsd-local/src/sys/kern/subr_pool.c:1429
1429 if (drainpp == NULL) {
(gdb) disass pool_drain
This looks like one of the crashes riz@ had on a tegra which I think was also
building packages.
I'm still working on a bisect - so far I have confirmed that the issue
occurs at least as far back as Oct 10, possibly longer.
My "reproduction" involves building a large number of packages, one at
a time, with MAKE_JOBS=3. At first I wasn't paying much attention, but
all of the crashes I specifically remember were on the 359th package,
www/firefox !
=> 0xffffffff80333415 <+59>: mov (%rax),%rdx
I think %rax will be "weird" and indicate pool_head list corruption - no idea
why, though.
%rax looks reasonable:
(gdb) info reg
rax 0xffffffff8099fb40 -2137392320
rbx 0x0 0
rcx 0xffffffff80724880 -2139993984
rdx 0x0 0
...
and matches the value reported for drainpp
(gdb) print drainpp
$1 = (struct pool *) 0xffffffff8099fb40
and which also matches the tailq's tqh_last
$3 = {tqh_first = 0xffffffff80724880 <uvm_amap_cache>,
tqh_last = 0xffffffff8099fb40}
However, it seems that something has been badly corrupted:
(gdb) print *drainpp
Cannot access memory at address 0xffffffff8099fb40
+------------------+--------------------------+-------------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+-------------------------+
Home |
Main Index |
Thread Index |
Old Index