Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: panic: pmap_tlb_pendcount > 0 failed



On Thu, Mar 28, 2013 at 09:00:45AM +0100, Thomas Klausner wrote:
> On Thu, Mar 28, 2013 at 12:29:40AM +0100, Thomas Klausner wrote:
> > Hi!
> > 
> > I've just run a GENERIC-6.99.18/amd64 kernel from today on a newly
> > acquired machine, and see:
> > 
> > ...
> > acpicpu11 at cpu11: ACPI CPU
> > panic: kernel diagnostic assertion "pmap_tlb_pendcount > 0" failed: file 
> > "../../../../arch/x86/x86/pmap_tlb.c", line 451
> > fatal breakpoint trap in supervisor mode
> > trap type 1 code 0 rip ffffffff802546ed cs 8 rflags 46 cr2 0 ilevel 0 rsp 
> > fffffe813a6369e0
> > curlwp 0xfffffe887556bb20 pid 0.71 lowest kstrack 0xfffffe813a633000
> > Stopped in pid 0.71 (system) at netbsd:breakpoint+0x5: leave
> > db{10}> bt
> > breakpoint() at netbsd:breakpoint+0x5
> > vpanic() at netbsd:vpanic+0x136
> > kern_assert() at netbsd:kern_assert+0x48
> > pmap_tlb_intr() at netbsd:pmap_tlb_intr+0xf4
> > DDB lost frame for netbsd:Xinter_lapic_tlb+0x98, trying 0xfffffe813636aa0
> > Xintr_lapic_tlb() at netbsd:Xinter_lapic_tlb+0x98
> > --- interrupt ---
> > 246:
> > db{10}>
> > 
> > dmesg is too long to copy by hand, but available on the db prompt.
> > It's a Supermicro X9SRi with a Xeon E5-1650@3.20GHz.
> 
> PR 47437 by Taylor R Campbell might be related, he writes:
> 
>    Sometimes when I boot a many-core machine, during autoconf I
>    get a panic after the ACPI CPU devices are configured.  I've
>    seen the panic several times; last night I caught it on the
>    serial console for the first time with ddb and grabbed a stack
>    trace.  I believe it always happens after all the acpicpuN
>    devices are attached, but I'm not sure.
> 
> so it's the same place in the boot process; but his panic is
> 
> panic: kernel diagnostic assertion "pmap_tlb_pendcount < ncpu" failed: file 
> "/home/riastradh/netbsd/current/src/sys/arch/x86/x86/pmap_tlb.c", line 434

I've looked at my panic a bit more:

pmap_tlb_pendcount is a static volatile u_int, but the kassert triggers:
        KASSERT(pmap_tlb_pendcount > 0);
So I assume it must be zero at that time.

The kassert is in sys/arch/x86/x86/pmap_tlb.c in the function
pmap_tlb_intr(). I have found only one caller for it,
sys/arch/i386/i386/vector.S:

/*
 * TLB shootdown handler.
 */
IDTVEC(intr_lapic_tlb)
        pushl   $0
        pushl   $T_ASTFLT
        INTRENTRY
        movl    $0, _C_LABEL(local_apic)+LAPIC_EOI
        call    _C_LABEL(pmap_tlb_intr)
        INTRFASTEXIT
IDTVEC_END(intr_lapic_tlb)

On the other hand, there is only one place where it is increased, in
sys/arch/x86/x86/pmap_tlb.c again:
...
        local = kcpuset_isset(target, cid) ? 1 : 0;
        rcpucount = kcpuset_countset(target) - local;
#ifdef MULTIPROCESSOR
        if (rcpucount) {
...
                while (atomic_cas_uint(&pmap_tlb_pendcount, 0, rcpucount)) {
                        splx(s);
                        count = SPINLOCK_BACKOFF_MIN;
                        while (pmap_tlb_pendcount) {
                                KASSERT(pmap_tlb_pendcount < ncpu);
                                SPINLOCK_BACKOFF(count);
                        }
                        s = splvm();
                        /* An interrupt might have done it for us. */
                        if (tp->tp_count == 0) {
                                splx(s);
                                return;
                        }
                }
...


I don't know enough about this to dig further here.
 Thomas


Home | Main Index | Thread Index | Old Index