tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: assertion "spc->spc_migrating == NULL" failed
hi,
> Hello,
> working with a source code based on the matt-nb5-mips64 branch,
> I can reproduce this panic:
> panic: kernel diagnostic assertion "spc->spc_migrating == NULL" failed: file
> "/dsk/l1/misc/bouyer/tmp/src/sys/kern/kern_synch.c", line 656
> mttycn_pollc 1 ipl 0x6
> Stopped in pid 0.4 (system) at netbsd:cpu_Debugger+0x4: jr ra
> bdslot: nop
> db{0}> tr
> cpu_Debugger+4 (c04bd000,b300,10,c0407c00) ra c02192ac sz 0
> panic+1d4 (c04bd000,c02de430,c02f1450,c02f1360) ra c02cac78 sz 48
> __kernassert+48 (c04bd000,c02de430,c02f1450,c02f1360) ra c01f74a4 sz 32
> mi_switch+640 (c04bd000,c02de430,c02f1450,c02f1360) ra c01f3130 sz 64
> sleepq_block+f0 (c04bd000,c02de430,c02f1450,c02f1360) ra c0202f54 sz 48
> turnstile_block+2d0 (c04bd000,c02de430,c02f1450,c02f1360) ra c01e254c sz 56
> mutex_vector_enter+268 (c04bd000,c02de430,c02f1450,c02f1360) ra c026e2cc sz 64
> wapbl_biodone+48 (c04bd000,c02de430,c02f1450,c02f1360) ra c0255638 sz 48
> biodone2+a4 (c04bd000,c02de430,c02f1450,c02f1360) ra c02557c8 sz 32
> biointr+ac (c04bd000,c02de430,c02f1450,c02f1360) ra c01f3acc sz 32
> softint_dispatch+c4 (c04bd000,c02de430,c02f1450,c02f1360) ra c0295fe4 sz 72
> softint_fast_dispatch+80 (0,c02de430,c02f1450,c02f1360) ra 0 sz 24
> User-level: pid 0.4
>
>
> (The soft int may vary). Looking at the sources, I see that
> sched_nextlwp() is carefull to not propose a new lwp if a migration is in
> progress. But when this KASSERT fires we're not necesserely about to
> switch to a new (non-idle) lwp, but the current lwp got woken up by another
> CPU while it was about to switch.
>
> Shouldn't
> KASSERT(spc->spc_migrating == NULL);
> if (l->l_target_cpu != NULL) {
> spc->spc_migrating = l;
> }
> be instead:
> if (l->l_target_cpu != NULL) {
> KASSERT(spc->spc_migrating == NULL);
> spc->spc_migrating = l;
> }
>
> I did the above change and it seems to work, can someone confirm this is
> correct ?
i think you're correct.
i have the attached patch long-staying in my local tree.
i haven't committed it because it hasn't been reproduced on my machine yet.
YAMAMOTO Takashi
>
> --
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
Index: kern_synch.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_synch.c,v
retrieving revision 1.284
diff -u -p -r1.284 kern_synch.c
--- kern_synch.c 2 Nov 2010 15:17:37 -0000 1.284
+++ kern_synch.c 23 Nov 2010 22:16:57 -0000
@@ -654,9 +654,22 @@ mi_switch(lwp_t *l)
l->l_stat = LSRUN;
lwp_setlock(l, spc->spc_mutex);
sched_enqueue(l, true);
- /* Handle migration case */
- KASSERT(spc->spc_migrating == NULL);
- if (l->l_target_cpu != NULL) {
+#if 1
+ if (spc->spc_migrating != NULL) {
+ printf("%s: bug %p %p %p\n", __func__, l, newl,
spc);
+ }
+#endif
+ /*
+ * Handle migration case
+ *
+ * spc_migrating != NULL here means that a softint
+ * which interrupted the idle lwp is blocking.
+ */
+ KASSERT(spc->spc_migrating == NULL ||
+ ((l->l_pflag & LP_INTR) != 0 &&
+ newl != NULL && (newl->l_flag & LW_IDLE) != 0));
+ if (l->l_target_cpu != NULL) {
+ KASSERT((l->l_pflag & LP_INTR) == 0);
spc->spc_migrating = l;
}
} else
Home |
Main Index |
Thread Index |
Old Index