NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-alpha/38335 (kernel freeze on alpha MP system)
The following reply was made to PR port-alpha/38335; it has been noted by GNATS.
From: Jarle Greipsland <jarle%uninett.no@localhost>
To: mhitch%lightning.msu.montana.edu@localhost
Cc: gnats-bugs%NetBSD.org@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Wed, 28 Oct 2009 16:01:33 +0100 (CET)
"Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost> writes:
> OK, here's something else to try. I was looking through the alpha
> hardware reference manual and checking some of the multiprocessor
> information. I noted that it showed the use of memory barriers when
> sending/receiving interrupts between processors. It looks like the atomic
> operations that were used in the netbsd-4 branch included the memory
> barrier, but the ones used in netbsd-5 and later do not. This patch
> should add back the memory barriers need for the IPI stuff.
[ ... ]
OK, I have applied your patch (and removed the old ones except
for the one that generates the "Whoa!"-warnings.). The kernel
I'm running is GENERIC.MP based on -current from Oct 14th, with
the following diff:
----------------------------------------------------------------------
Index: arch/alpha/alpha/ipifuncs.c
===================================================================
RCS file: /cvsroot/src/sys/arch/alpha/alpha/ipifuncs.c,v
retrieving revision 1.40
diff -u -r1.40 ipifuncs.c
--- arch/alpha/alpha/ipifuncs.c 28 Apr 2008 20:23:10 -0000 1.40
+++ arch/alpha/alpha/ipifuncs.c 28 Oct 2009 14:55:46 -0000
@@ -130,7 +130,7 @@
return;
}
#endif
-
+ alpha_mb();
pending_ipis = atomic_swap_ulong(&ci->ci_ipis, 0);
/*
@@ -167,6 +167,7 @@
#endif
atomic_or_ulong(&cpu_info[cpu_id]->ci_ipis, ipimask);
+ alpha_mb();
alpha_pal_wripir(cpu_id);
}
Index: arch/alpha/alpha/pmap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/alpha/alpha/pmap.c,v
retrieving revision 1.243
diff -u -r1.243 pmap.c
--- arch/alpha/alpha/pmap.c 4 Oct 2009 17:00:31 -0000 1.243
+++ arch/alpha/alpha/pmap.c 28 Oct 2009 14:55:46 -0000
@@ -3699,6 +3699,12 @@
* don't really have to do anything else.
*/
mutex_spin_enter(&pq->pq_lock);
+ if (pj && pj == pq->pq_head.tqh_first) {
+ printf("Whoa! pool_cache_get returned an in-use entry!
ci_index %d pj %p\n",
+ self->ci_index, pj);
+/**/ /* panic("Oops"); */
+ pj = NULL;
+ }
pq->pq_pte |= pte;
if (pq->pq_tbia) {
mutex_spin_exit(&pq->pq_lock);
----------------------------------------------------------------------
And it both "Whoa!"s and panics:
----------------------------------------------------------------------
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003f9ef980
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003d8bfc00
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 1 pj
0xfffffc003f9ee400
Whoa! pool_cache_get returned an in-use entry! ci_index 0 pj
0xfffffc003f9ee400
CPU 1: fatal kernel trap:
CPU 1 trap entry = 0x2 (memory management fault)
CPU 1 a0 = 0x40
CPU 1 a1 = 0x1
CPU 1 a2 = 0x0
CPU 1 pc = 0xfffffc00007371a8
CPU 1 ra = 0xfffffc0000737118
CPU 1 pv = 0xfffffc00005f6130
CPU 1 curlwp = 0xfffffc003f960800
CPU 1 pid = 0, comm = system
panic: trap
Stopped in pid 0.37 (system) at netbsd:cpu_Debugger+0x4: ret
zero,(ra)
db{1}> tr
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x268
trap() at netbsd:trap+0x35c
XentMM() at netbsd:XentMM+0x20
--- memory management fault (from ipl 5) ---
pmap_do_tlb_shootdown() at netbsd:pmap_do_tlb_shootdown+0xe8
alpha_ipi_process() at netbsd:alpha_ipi_process+0xb8
interrupt() at netbsd:interrupt+0x88
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
mutex_spin_exit() at netbsd:mutex_spin_exit+0x5c
pmap_tlb_shootdown() at netbsd:pmap_tlb_shootdown+0x170
pmap_kremove() at netbsd:pmap_kremove+0xac
uvm_pagermapout() at netbsd:uvm_pagermapout+0x40
uvm_aio_aiodone() at netbsd:uvm_aio_aiodone+0xd4
db{1}> show reg
v0 0xfffffe0000034800
t0 0x1
t1 0x1
t2 0xfffffc003ff48000
t3 0
t4 0
t5 0xfffffc0000b46a3d __func__.21238+0x91c
t6 0
t7 0
s0 0xfffffc0000c378e0 msgbufenabled
s1 0x104
s2 0xfffffc0000c350a8 db_onpanic
s3 0xfffffc00009bf5fc reg_to_frame+0x5c8
s4 0xfffffe0013923a38
s5 0x40
s6 0xfffffc003f960800
a0 0x5
a1 0xfffffd01fc0003f8
a2 0
a3 0x8
a4 0x3
a5 0xfffffe0000000008
t8 0xfffffe00139237ff
t9 0x8
t10 0x3ea0a5
t11 0x1ff800
ra 0xfffffc000080c5b8 panic+0x268
t12 0xfffffc00003eb590 cpu_Debugger
at 0x12002438c
gp 0xfffffc0000c30928
__link_set_prop_linkpools_sym__link__prop_array_pool+0x8008
sp 0xfffffe0013923888
pc 0xfffffc00003eb594 cpu_Debugger+0x4
ps 0x6
ai 0x1ff800
pv 0xfffffc00003eb590 cpu_Debugger
netbsd:cpu_Debugger+0x4: ret zero,(ra)
db{1}> mach cpu 0
Using CPU 0
db{1}> tr
CPU 1: fatal kernel trap:
CPU 1 trap entry = 0x2 (memory management fault)
CPU 1 a0 = 0xffffffffffffffd9
CPU 1 a1 = 0x1
CPU 1 a2 = 0x0
CPU 1 pc = 0xfffffc00003ee944
CPU 1 ra = 0xfffffc00003e8104
CPU 1 pv = 0xfffffc00003ee890
CPU 1 curlwp = 0xfffffc003f960800
CPU 1 pid = 0, comm = system
Caught exception in ddb.
db{1}> show reg
v0 0
t0 0
t1 0xfffffc003fe29c00
t2 0xfffffe0012c8a400
t3 0
t4 0
t5 0xfffffc003fe29c60
t6 0xfffffc0000c68410 kernel_pmap_store+0x50
t7 0
s0 0x1
s1 0xfffffc003fe29c00
s2 0xfffffc0000c0de60 cpu_info_primary+0x38
s3 0xfffffc0000c0de28 cpu_info_primary
s4 0
s5 0
s6 0
a0 0
a1 0
a2 0xfffffe0012c9a000
a3 0x1
a4 0xfffffc0000c94308 uvm_fpageqlock
a5 0
t8 0x1604db790
t9 0
t10 0xffffffff
t11 0xfffffc003f90a8b8
ra 0xfffffc00005eac88 idle_loop+0x1b8
t12 0xfffffc000060e200 kpreempt_enable
at 0xfffffe0013984000
gp 0xfffffc0000c30928
__link_set_prop_linkpools_sym__link__prop_array_pool+0x8008
sp 0x1
pc 0xfffffc00005eac34 idle_loop+0x164
ps 0
ai 0xfffffc003f90a8b8
pv 0xfffffc000060e200 kpreempt_enable
netbsd:idle_loop+0x164: ldq pv,-1d30(gp)
----------------------------------------------------------------------
So, no cigar this time. Anything else I should try?
-jarle
--
Q: What's the difference between programming and bug collecting?
A: None.
Home |
Main Index |
Thread Index |
Old Index