Subject: Re: Fix for SGImips interrupt nesting crashes
To: Rafal Boni <rafal@attbi.com>
From: sgimips NetBSD list <sgimips@mrynet.com>
List: port-sgimips
Date: 04/21/2002 21:47:14
Rafal, et al,
I've beaten up on the serial console on an INDY since you send this patch
out and I have yet to experience the panics with the patch.
I've rolled this patch into my latest NetBSD/sgimips "release" snapshot
which will be announced on the sgimips shortly.
Thanks for addressing this.
-scott
> Folks:
> Below is a simple patch to the sgimips cpu_intr() routine to prevent
> the arbitrary levels of interrupt nesting that were possible before
> and lead to really strange panic()s.
>
> I'm sending this to port-mips as well, as I know several of the
> other MIPS ports also have the same issue as I the sgi code was
> patterned on one of the other MIPS ports. The basic issue is
> that hardware interrupts are re-enabled before unwinding the
> current being used to service the current interrupt, potentially
> causing the kernel stack to overflow when interrupt activity is
> high.. See my previous message [*] for more info...
>
> I'd like to commit this soon as it causes frequent, odd and hard-
> to-diagnose panics when interrupt load is high (like when using
> serial console at speeds > 9600bps, mixing heavy disk activity
> with serial activity, etc.), but I'd also like to get people's
> feedback. Note that similar changes done to ip22.c would also
> need to be done in ip32.c (though I'm not currently worried as
> the O2 code has worse problems, I intent to update it neverthe-
> less).
>
> A quick glance indicates most of the mips ports *do* have similar
> problems; sbmips seems to avoid falling into this trap with some
> XXX'ed hackery. My quick glance seems to say that algor, arc,
> cobalt, evbmips/malta, hpcmips/{tx39,vr}, mipsco, newsmips and
> pmax could all have similar problems, though my check was very
> quick and crude, so I could be off.
>
> Finally, a quick comment about the second dangling splhigh() --
> the idea here is that once we've dropped the bits form the
> "current interrupts being serviced" mask, we're required to
> give up the stack space used to handle those interrupts, so
> we mask out all interrupts, reset the mask and then let the
> return from interrupt restore the previous interrupt mask.
> (ie, yes, it's intentional).
>
> Most importantly, thanks to all those who pitched in ideas
> on how to deal with this, especially Chuq, who helped get
> the ideas floating in my head much better organized 8-)
>
> --rafal
>
> [*] http://mail-index.netbsd.org/port-sgimips/2002/04/05/0001.html
>
> Index: ip22.c
> ===================================================================
> RCS file: /cvsroot/syssrc/sys/arch/sgimips/sgimips/ip22.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 ip22.c
> --- ip22.c 2002/03/13 13:12:29 1.8
> +++ ip22.c 2002/04/20 04:00:18
> @@ -46,6 +46,9 @@
> static struct evcnt mips_int5_evcnt =
> EVCNT_INITIALIZER(EVCNT_TYPE_INTR, NULL, "mips", "int 5 (clock)");
>
> +static struct evcnt mips_spurint_evcnt =
> + EVCNT_INITIALIZER(EVCNT_TYPE_INTR, NULL, "mips", "spurious interrupts");
> +
> static u_int32_t iocwrite; /* IOC write register: read-only */
> static u_int32_t iocreset; /* IOC reset register: read-only */
>
> @@ -60,7 +63,7 @@ void ip22_bus_reset(void);
> int ip22_local0_intr(void);
> int ip22_local1_intr(void);
> int ip22_mappable_intr(void *);
> -void ip22_intr(u_int, u_int, u_int, u_int);
> +void ip22_intr(u_int, u_int, u_int, u_int);
> void ip22_intr_establish(int, int, int (*)(void *), void *);
>
> unsigned long ip22_clkread(void);
> @@ -180,6 +183,7 @@ ip22_init(void)
> ticks_per_usec = cps * hz / 1000000;
>
> evcnt_attach_static(&mips_int5_evcnt);
> + evcnt_attach_static(&mips_spurint_evcnt);
> }
>
> void
> @@ -235,7 +239,8 @@ ip22_intr(status, cause, pc, ipending)
> cause &= ~MIPS_INT_MASK_4;
> }
>
> - _splset((status & ~cause & MIPS_HARD_INT_MASK) | MIPS_SR_INT_IE);
> + if (cause & MIPS_HARD_INT_MASK)
> + mips_spurint_evcnt.ev_count++;
> }
>
> int
> Index: machdep.c
> ===================================================================
> RCS file: /cvsroot/syssrc/sys/arch/sgimips/sgimips/machdep.c,v
> retrieving revision 1.34
> diff -u -p -r1.34 machdep.c
> --- machdep.c 2002/03/13 13:12:29 1.34
> +++ machdep.c 2002/04/20 04:00:18
> @@ -824,6 +824,8 @@ cpu_intr_establish(level, ipl, func, arg
> return (void *) -1;
> }
>
> +int servicing_mask = 0;
> +
> void
> cpu_intr(status, cause, pc, ipending)
> u_int32_t status;
> @@ -831,16 +833,32 @@ cpu_intr(status, cause, pc, ipending)
> u_int32_t pc;
> u_int32_t ipending;
> {
> +#ifdef DIAGNOSTIC
> + if (ipending != 0 && (ipending & ~servicing_mask) == 0)
> + panic("Re-entrancy in cpu_intr: pending %x, servicing %x\n",
> + ipending, servicing_mask);
> +#endif
> +
> uvmexp.intrs++;
>
> if (ipending & MIPS_HARD_INT_MASK)
> (*platform.iointr)(status, cause, pc, ipending);
>
> /* software simulated interrupt */
> - if ((ipending & MIPS_SOFT_INT_MASK_1)
> - || (ssir && (status & MIPS_SOFT_INT_MASK_1))) {
> + if (servicing_mask == 0 &&
> + ((ipending & MIPS_SOFT_INT_MASK_1)
> + || (ssir && (status & MIPS_SOFT_INT_MASK_1)))) {
> +
> + splhigh();
> + servicing_mask |= ipending;
> + _splset(MIPS_SR_INT_IE |
> + (status & ~servicing_mask & MIPS_HARD_INT_MASK));
> +
> _clrsoftintr(MIPS_SOFT_INT_MASK_1);
> softintr_dispatch();
> +
> + splhigh();
> + servicing_mask &= ~ipending;
> }
> }
>
> ----
> Rafal Boni rafal@attbi.com
> We are all worms. But I do believe I am a glowworm. -- Winston Churchill
>