Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: amd64-7.0_STABLE trap panic
On Sat, 7 Nov 2015, John D. Baker wrote:
It happened again. This time, the crashdump shows:
$ crash -M netbsd.3.core -N netbsd.3
Crash version 7.0_STABLE, image version 7.0_STABLE.
System panicked: trap
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NAGR() at 0
_KERNEL_OPT_NAGR() at 0
vpanic() at vpanic+0x145
snprintf() at snprintf
startlwp() at startlwp
calltrap() at calltrap+0x11
ufsquota_free() at ufsquota_free+0x15
ufs_reclaim() at ufs_reclaim+0xaf
ffs_reclaim() at ffs_reclaim+0xa1
VOP_RECLAIM() at VOP_RECLAIM+0x2f
vclean() at vclean+0xa6
cleanvnode() at cleanvnode+0xb8
vdrain_thread() at vdrain_thread+0x58
crash>
At the time, it was performing a CVS update to pick up the latest
pull-ups to the netbsd-7 branch while an NFS client was writing to
my home directory with 'scp' copying files from a remote system.
Up through 7.0 (release), I'd never had a problem with the machine. I'm
reluctant to suspect hardware problems...
Interesting - the stack traceback diverges from your previous report,
after the entry for startlwp.
For amd64, this routine is located in sys/arch/amd64/amd64/trap.c and
starts with
void
startlwp(void *arg)
{
ucontext_t *uc = arg;
lwp_t *l = curlwp;
int error __diagused;
error = cpu_setmcontext(l, &uc->uc_mcontext, uc->uc_flags);
KASSERT(error == 0);
...
And the machine code at this point looks like:
Dump of assembler code for function startlwp:
0xffffffff8011b1d7 <+0>: push %rbp
0xffffffff8011b1d8 <+1>: mov %rsp,%rbp
0xffffffff8011b1db <+4>: push %r12
0xffffffff8011b1dd <+6>: push %rbx
0xffffffff8011b1de <+7>: mov %rdi,%r12
0xffffffff8011b1e1 <+10>: mov %gs:0x1e8,%rbx
0xffffffff8011b1ea <+19>: lea 0x38(%rdi),%rsi
0xffffffff8011b1ee <+23>: mov (%rdi),%edx
0xffffffff8011b1f0 <+25>: mov %rbx,%rdi
0xffffffff8011b1f3 <+28>: callq 0xffffffff80119d78 <cpu_setmcontext>
0xffffffff8011b1f8 <+33>: test %eax,%eax
It might be useful if you could use gdb on the crash dump. Use the
bt command to figure out which frame is for startlwp, then
(gdb) frame <n>
(gdb) info reg
I'm guessing that %rdi is pointing somewhere invalid, and the 'mov
(%rdi),%edx' is triggering the fault. (This is probably the reference
to uc->uc_flags)
Now, as for why this is broken, I have no idea. :(
+------------------+--------------------------+-------------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+-------------------------+
Home |
Main Index |
Thread Index |
Old Index