Subject: Fix for SGImips interrupt nesting crashes
To: None <port-mips@netbsd.org, port-sgimips@netbsd.org>
From: Rafal Boni <rafal@attbi.com>
List: port-sgimips
Date: 04/20/2002 00:10:06
Folks:
Below is a simple patch to the sgimips cpu_intr() routine to prevent
the arbitrary levels of interrupt nesting that were possible before
and lead to really strange panic()s.
I'm sending this to port-mips as well, as I know several of the
other MIPS ports also have the same issue as I the sgi code was
patterned on one of the other MIPS ports. The basic issue is
that hardware interrupts are re-enabled before unwinding the
current being used to service the current interrupt, potentially
causing the kernel stack to overflow when interrupt activity is
high.. See my previous message [*] for more info...
I'd like to commit this soon as it causes frequent, odd and hard-
to-diagnose panics when interrupt load is high (like when using
serial console at speeds > 9600bps, mixing heavy disk activity
with serial activity, etc.), but I'd also like to get people's
feedback. Note that similar changes done to ip22.c would also
need to be done in ip32.c (though I'm not currently worried as
the O2 code has worse problems, I intent to update it neverthe-
less).
A quick glance indicates most of the mips ports *do* have similar
problems; sbmips seems to avoid falling into this trap with some
XXX'ed hackery. My quick glance seems to say that algor, arc,
cobalt, evbmips/malta, hpcmips/{tx39,vr}, mipsco, newsmips and
pmax could all have similar problems, though my check was very
quick and crude, so I could be off.
Finally, a quick comment about the second dangling splhigh() --
the idea here is that once we've dropped the bits form the
"current interrupts being serviced" mask, we're required to
give up the stack space used to handle those interrupts, so
we mask out all interrupts, reset the mask and then let the
return from interrupt restore the previous interrupt mask.
(ie, yes, it's intentional).
Most importantly, thanks to all those who pitched in ideas
on how to deal with this, especially Chuq, who helped get
the ideas floating in my head much better organized 8-)
--rafal
[*] http://mail-index.netbsd.org/port-sgimips/2002/04/05/0001.html
Index: ip22.c
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/sgimips/sgimips/ip22.c,v
retrieving revision 1.8
diff -u -p -r1.8 ip22.c
--- ip22.c 2002/03/13 13:12:29 1.8
+++ ip22.c 2002/04/20 04:00:18
@@ -46,6 +46,9 @@
static struct evcnt mips_int5_evcnt =
EVCNT_INITIALIZER(EVCNT_TYPE_INTR, NULL, "mips", "int 5 (clock)");
+static struct evcnt mips_spurint_evcnt =
+ EVCNT_INITIALIZER(EVCNT_TYPE_INTR, NULL, "mips", "spurious interrupts");
+
static u_int32_t iocwrite; /* IOC write register: read-only */
static u_int32_t iocreset; /* IOC reset register: read-only */
@@ -60,7 +63,7 @@ void ip22_bus_reset(void);
int ip22_local0_intr(void);
int ip22_local1_intr(void);
int ip22_mappable_intr(void *);
-void ip22_intr(u_int, u_int, u_int, u_int);
+void ip22_intr(u_int, u_int, u_int, u_int);
void ip22_intr_establish(int, int, int (*)(void *), void *);
unsigned long ip22_clkread(void);
@@ -180,6 +183,7 @@ ip22_init(void)
ticks_per_usec = cps * hz / 1000000;
evcnt_attach_static(&mips_int5_evcnt);
+ evcnt_attach_static(&mips_spurint_evcnt);
}
void
@@ -235,7 +239,8 @@ ip22_intr(status, cause, pc, ipending)
cause &= ~MIPS_INT_MASK_4;
}
- _splset((status & ~cause & MIPS_HARD_INT_MASK) | MIPS_SR_INT_IE);
+ if (cause & MIPS_HARD_INT_MASK)
+ mips_spurint_evcnt.ev_count++;
}
int
Index: machdep.c
===================================================================
RCS file: /cvsroot/syssrc/sys/arch/sgimips/sgimips/machdep.c,v
retrieving revision 1.34
diff -u -p -r1.34 machdep.c
--- machdep.c 2002/03/13 13:12:29 1.34
+++ machdep.c 2002/04/20 04:00:18
@@ -824,6 +824,8 @@ cpu_intr_establish(level, ipl, func, arg
return (void *) -1;
}
+int servicing_mask = 0;
+
void
cpu_intr(status, cause, pc, ipending)
u_int32_t status;
@@ -831,16 +833,32 @@ cpu_intr(status, cause, pc, ipending)
u_int32_t pc;
u_int32_t ipending;
{
+#ifdef DIAGNOSTIC
+ if (ipending != 0 && (ipending & ~servicing_mask) == 0)
+ panic("Re-entrancy in cpu_intr: pending %x, servicing %x\n",
+ ipending, servicing_mask);
+#endif
+
uvmexp.intrs++;
if (ipending & MIPS_HARD_INT_MASK)
(*platform.iointr)(status, cause, pc, ipending);
/* software simulated interrupt */
- if ((ipending & MIPS_SOFT_INT_MASK_1)
- || (ssir && (status & MIPS_SOFT_INT_MASK_1))) {
+ if (servicing_mask == 0 &&
+ ((ipending & MIPS_SOFT_INT_MASK_1)
+ || (ssir && (status & MIPS_SOFT_INT_MASK_1)))) {
+
+ splhigh();
+ servicing_mask |= ipending;
+ _splset(MIPS_SR_INT_IE |
+ (status & ~servicing_mask & MIPS_HARD_INT_MASK));
+
_clrsoftintr(MIPS_SOFT_INT_MASK_1);
softintr_dispatch();
+
+ splhigh();
+ servicing_mask &= ~ipending;
}
}
----
Rafal Boni rafal@attbi.com
We are all worms. But I do believe I am a glowworm. -- Winston Churchill