Port-alpha archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Panic on -current with MULTIPROCESSOR
> On Mar 30, 2021, at 1:15 PM, John Klos <john%ziaspace.com@localhost> wrote:
>
> Hi,
>
> I've been running pbulk builds on my DS25 with no problems at all when the kernel is compiled without MULTIPROCESSOR. We have a full pbulk build of 2020Q4 :)
>
> I just tried with MULTIPROCESSOR with -current from 28-March-2021, but it paniced within a few hours of starting a pbulk scan.
>
> Does anyone have a clue about what might be going on here?
This indicates that the pmap is having TLB invalidations processed on the local CPU (id 0) and a remove CPU (id 1). The local CPU is running at splhigh() (IPL 6 == ALPHA_PSL_IPL_HIGH). The local CPU has sent an IPI to the remote CPU to process the invalidation, and has timed out waiting for the remote CPU to process it.
Couple of things… I’m actually pretty surprised to see IPL == 6 there. At the top of pmap_tlb_shootnow() is the following:
/*
* Acquire the shootdown mutex. This will also block IPL_VM
* interrupts and disable preemption. It is critically important
* that IPIs not be blocked in this routine.
*/
KASSERT((alpha_pal_rdps() & ALPHA_PSL_IPL_MASK) < ALPHA_PSL_IPL_CLOCK);
mutex_spin_enter(&tlb_lock);
…this is because on the Apha, IPIs come in at the same IPL as the clock interrupt, which is IPL == 5.
tlb_lock is initialized thus:
mutex_init(&tlb_lock, MUTEX_SPIN, IPL_VM);
..and IPL_VM is ALPHA_PSL_IPL_IO_HI, which is IPL == 4.
So, something is raising our IPL to IPL_HIGH somewhere. How rude!
mutex_spin_enter() is just an alias for mutex_vector_enter() on Alpha, so looking there, we see that MUTEX_SPIN_SPLRAISE() does:
s = splraiseipl(MUTEX_SPIN_IPL(mtx));
…and:
#define MUTEX_SPIN_IPL(mtx) ((mtx)->mtx_ipl)
So, something is a little fishy here. I think there’s a chance that there is some slight brokenness with spin mutexes on Alpha that only show up on MULTIPROCESSOR kernels. I’ll take a deeper look a little later today.
>
> Thanks,
> John
>
>
> [ 74219.1614327] TLB LOCAL MASK = 0x0000000000000001
> [ 74219.1614327] TLB REMOTE MASK = 0x0000000000000002
> [ 74219.1614327] TLB REMOTE PENDING = 0x0000000000000002
> [ 74219.1614327] TLB CONTEXT = 0xfffffc001b83dd58
> [ 74219.1614327] TLB LOCAL IPL = 6
> [ 74219.1614327] panic: pmap_tlb_shootnow
> [ 74219.1614327] cpu0: Begin traceback...
> [ 74219.1614327] alpha trace requires known PC =eject=
> [ 74219.1614327] cpu0: End traceback...
> Stopped in pid 5054.5054 (echo) at netbsd:cpu_Debugger+0x4: ret
> zero,(ra)
> db{0}>
> db{0}> bt
> cpu_Debugger() at netbsd:cpu_Debugger+0x4
> db_panic() at netbsd:db_panic+0xc8
> vpanic() at netbsd:vpanic+0x10c
> panic() at netbsd:panic+0x58
> pmap_tlb_shootnow.part.0() at netbsd:pmap_tlb_shootnow.part.0+0x234
> pmap_remove_internal() at netbsd:pmap_remove_internal+0x328
> pmap_remove() at netbsd:pmap_remove+0x2c
> uvm_unmap_remove() at netbsd:uvm_unmap_remove+0x158
> sys_munmap() at netbsd:sys_munmap+0xb8
> syscall() at netbsd:syscall+0x260
> XentSys() at netbsd:XentSys+0x5c
> --- syscall (73) ---
> --- user mode ---
> PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
> 5054 >5054 7 0 0 fffffc02e8db0500 echo
> 20918 20918 3 1 180 fffffc000ef9e480 make wait
> 19235 19235 3 0 180 fffffc0019d9ee40 make pipe_rd
> 28954 28954 3 1 180 fffffc001aa5c140 make pipe_rd
> 2913 2913 3 0 180 fffffc001b093a80 top select
> 7334 7334 3 1 180 fffffc0016176080 pbulk-scan pipe_rd
> 17864 17864 3 1 180 fffffc000c14ea80 sh wait
> 28220 28220 3 0 180 fffffc0019d9e5c0 sh wait
> 26180 26180 3 1 180 fffffc001aa5c9c0 tcsh pause
> 15702 15702 3 1 180 fffffc001b092100 tcsh pause
> 5805 5805 3 1 180 fffffc001aa5d240 tcsh pause
> 16556 16556 3 0 180 fffffc000c14eec0 tcsh pause
> 18472 18472 3 1 180 fffffc000c14f300 tmux kqueue
> 28224 28224 3 1 180 fffffc001aa5ce00 tcsh pause
> 11357 11357 3 0 180 fffffc0019d9f280 sshd select
> 14035 14035 3 0 180 fffffc0016176d40 sshd poll
> 17970 17970 3 1 180 fffffc000c234680 tcsh ttyraw
> 18858 18858 3 1 180 fffffc00161764c0 sshd select
> 23130 23130 3 0 180 fffffc001b093200 sshd poll
> 338 338 3 0 1c0 fffffc02fd68cdc0 getty ttyraw
> 326 326 3 1 180 fffffc000ef9f9c0 cron nanoslp
> 2204 2204 3 1 180 fffffc000ef9f580 inetd kqueue
> 1695 1695 3 0 180 fffffc000ef9f140 sshd select
> 206 206 3 0 180 fffffc000ef9e8c0 cu poll
> 205 205 3 1 180 fffffc000c234ac0 cu ttyraw
> 203 203 3 1 180 fffffc000c235bc0 ntpd pause
> 2213 2213 3 0 180 fffffc000c235780 tcsh pause
> 2043 2043 3 0 180 fffffc000c235340 tmux kqueue
> 1240 1240 3 0 180 fffffc000c234f00 syslogd kqueue
> 1 1 3 1 180 fffffc000086ee80 init wait
> 0 1313 3 1 200 fffffc000c234240 acctwatch actwat
> 0 228 3 0 200 fffffc00007d4140 physiod physiod
> 0 126 3 1 200 fffffc000c14e200 pooldrain pooldrain
> 0 125 3 1 200 fffffc000086fb40 ioflush syncer
> 0 124 3 1 200 fffffc000086f700 pgdaemon pgdaemon
> 0 121 3 1 200 fffffc02fd68c540 raidio0 raidiow
> 0 120 3 0 200 fffffc02fd68d640 raid0 rfnodeq
> 0 119 3 1 200 fffffc000086f2c0 npfgc0 npfgcw
> 0 118 3 0 200 fffffc000086ea40 rt_free rt_free
> 0 117 3 0 200 fffffc000086e600 unpgc unpgc
> 0 116 3 1 200 fffffc000086e1c0 icmp6_wqinput/1 icmp6_wqinput
> 0 115 3 0 200 fffffc0000787b00 icmp6_wqinput/0 icmp6_wqinput
> 0 114 3 1 200 fffffc00007876c0 ip6flow ip6flow
> 0 113 3 1 200 fffffc0000787280 nd6_timer nd6_timer
> 0 112 3 1 200 fffffc0000786e40 carp6_wqinput/1 carp6_wqinput
> 0 111 3 0 200 fffffc0000786a00 carp6_wqinput/0 carp6_wqinput
> 0 110 3 1 200 fffffc00007865c0 carp_wqinput/1 carp_wqinput
> 0 109 3 0 200 fffffc00007d4580 carp_wqinput/0 carp_wqinput
> 0 108 3 1 200 fffffc00007d49c0 icmp_wqinput/1 icmp_wqinput
> 0 107 3 0 200 fffffc00007d4e00 icmp_wqinput/0 icmp_wqinput
> 0 106 3 1 200 fffffc00007d5240 rt_timer rt_timer
> 0 105 3 0 200 fffffc00007d5680 ipflow_slowtimo ipflow_slowtimo
>
> 0 104 3 1 200 fffffc00007d5ac0 vmem_rehash vmem_rehash
> 0 103 3 1 200 fffffc0000786180 entbutler entropy
> 0 29 3 0 200 fffffc02fd68da80 iic0 iicintr
> 0 27 3 0 200 fffffc02fd68d200 scsibus2 sccomp
> 0 25 3 0 200 fffffc02fd68c980 scsibus1 sccomp
> 0 23 3 0 200 fffffc02fd68c100 scsibus0 sccomp
> 0 22 3 0 200 fffffc02fd713a40 atabus1 atath
> 0 21 3 0 200 fffffc02fd713600 atabus0 atath
> 0 20 3 1 200 fffffc02fd7131c0 xcall/1 xcall
> 0 19 1 1 200 fffffc02fd712d80 softser/1
> 0 18 1 1 200 fffffc02fd712940 softclk/1
> 0 17 1 1 200 fffffc02fd712500 softbio/1
> 0 16 1 1 200 fffffc02fd7120c0 softnet/1
> 0 > 15 1 1 201 fffffc02fef39a00 idle/1
> 0 14 3 0 200 fffffc02fef395c0 pmfsuspend pmfsuspend
> 0 13 3 0 200 fffffc02fef39180 pmfevent pmfevent
> 0 12 3 0 200 fffffc02fef38d40 sopendfree sopendfr
> 0 11 3 1 200 fffffc02fef38900 iflnkst iflnkst
> 0 10 3 0 200 fffffc02fef384c0 nfssilly nfssilly
> 0 9 3 0 240 fffffc02fef38080 vdrain vdrain
> 0 8 3 0 200 fffffc02ff74f9c0 modunload mod_unld
> 0 7 3 0 200 fffffc02ff74f580 xcall/0 xcall
> 0 6 1 0 200 fffffc02ff74f140 softser/0
> 0 5 1 0 200 fffffc02ff74ed00 softclk/0
> 0 4 1 0 200 fffffc02ff74e8c0 softbio/0
> 0 3 1 0 200 fffffc02ff74e480 softnet/0
> 0 2 1 0 201 fffffc02ff74e040 idle/0
> 0 0 3 1 200 fffffc00014d0f80 swapper uvm
-- thorpej
Home |
Main Index |
Thread Index |
Old Index