Subject: Re: Fix for SGImips interrupt nesting crashes
To: Rafal Boni <rafal@attbi.com>
From: sgimips NetBSD list <sgimips@mrynet.com>
List: port-mips
Date: 04/21/2002 21:47:14
Rafal, et al,

I've beaten up on the serial console on an INDY since you send this patch
out and I have yet to experience the panics with the patch.

I've rolled this patch into my latest NetBSD/sgimips "release" snapshot
which will be announced on the sgimips shortly.

Thanks for addressing this.

-scott

> Folks:
> 	Below is a simple patch to the sgimips cpu_intr() routine to prevent
> 	the arbitrary levels of interrupt nesting that were possible before
> 	and lead to really strange panic()s.
> 
> 	I'm sending this to port-mips as well, as I know several of the
> 	other MIPS ports also have the same issue as I the sgi code was
> 	patterned on one of the other MIPS ports.  The basic issue is
> 	that hardware interrupts are re-enabled before unwinding the
> 	current being used to service the current interrupt, potentially
> 	causing the kernel stack to overflow when interrupt activity is
> 	high.. See my previous message [*] for more info...
> 
> 	I'd like to commit this soon as it causes frequent, odd and hard-
> 	to-diagnose panics when interrupt load is high (like when using
> 	serial console at speeds > 9600bps, mixing heavy disk activity
> 	with serial activity, etc.), but I'd also like to get people's
> 	feedback.  Note that similar changes done to ip22.c would also
> 	need to be done in ip32.c (though I'm not currently worried as
> 	the O2 code has worse problems, I intent to update it neverthe-
> 	less).
> 
> 	A quick glance indicates most of the mips ports *do* have similar
> 	problems; sbmips seems to avoid falling into this trap with some
> 	XXX'ed hackery.  My quick glance seems to say that algor, arc,
> 	cobalt, evbmips/malta, hpcmips/{tx39,vr}, mipsco, newsmips and
> 	pmax could all have similar problems, though my check was very
> 	quick and crude, so I could be off.
> 
> 	Finally, a quick comment about the second dangling splhigh() --
> 	the idea here is that once we've dropped the bits form the
> 	"current interrupts being serviced" mask, we're required to
> 	give up the stack space used to handle those interrupts, so
> 	we mask out all interrupts, reset the mask and then let the
> 	return from interrupt restore the previous interrupt mask.
> 	(ie, yes, it's intentional).
> 
> 	Most importantly, thanks to all those who pitched in ideas
> 	on how to deal with this, especially Chuq, who helped get
> 	the ideas floating in my head much better organized 8-)
> 
> --rafal
> 
> [*] http://mail-index.netbsd.org/port-sgimips/2002/04/05/0001.html
> 
> Index: ip22.c
> ===================================================================
> RCS file: /cvsroot/syssrc/sys/arch/sgimips/sgimips/ip22.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 ip22.c
> --- ip22.c	2002/03/13 13:12:29	1.8
> +++ ip22.c	2002/04/20 04:00:18
> @@ -46,6 +46,9 @@
>  static struct evcnt mips_int5_evcnt =
>      EVCNT_INITIALIZER(EVCNT_TYPE_INTR, NULL, "mips", "int 5 (clock)");
>  
> +static struct evcnt mips_spurint_evcnt =
> +    EVCNT_INITIALIZER(EVCNT_TYPE_INTR, NULL, "mips", "spurious interrupts");
> +
>  static u_int32_t iocwrite;	/* IOC write register: read-only */
>  static u_int32_t iocreset;	/* IOC reset register: read-only */
>  
> @@ -60,7 +63,7 @@ void 		ip22_bus_reset(void);
>  int 		ip22_local0_intr(void);
>  int		ip22_local1_intr(void);
>  int 		ip22_mappable_intr(void *);
> -void 		ip22_intr(u_int, u_int, u_int, u_int);
> +void		ip22_intr(u_int, u_int, u_int, u_int);
>  void		ip22_intr_establish(int, int, int (*)(void *), void *);
>  
>  unsigned long 	ip22_clkread(void);
> @@ -180,6 +183,7 @@ ip22_init(void)
>  	ticks_per_usec = cps * hz / 1000000;
>  
>  	evcnt_attach_static(&mips_int5_evcnt);
> +	evcnt_attach_static(&mips_spurint_evcnt);
>  }
>  
>  void
> @@ -235,7 +239,8 @@ ip22_intr(status, cause, pc, ipending)
>  		cause &= ~MIPS_INT_MASK_4;
>  	}
>  
> -	_splset((status & ~cause & MIPS_HARD_INT_MASK) | MIPS_SR_INT_IE);
> +	if (cause & MIPS_HARD_INT_MASK) 
> +		mips_spurint_evcnt.ev_count++;
>  }
>  
>  int
> Index: machdep.c
> ===================================================================
> RCS file: /cvsroot/syssrc/sys/arch/sgimips/sgimips/machdep.c,v
> retrieving revision 1.34
> diff -u -p -r1.34 machdep.c
> --- machdep.c	2002/03/13 13:12:29	1.34
> +++ machdep.c	2002/04/20 04:00:18
> @@ -824,6 +824,8 @@ cpu_intr_establish(level, ipl, func, arg
>  	return (void *) -1;
>  }
>  
> +int servicing_mask = 0;
> +
>  void
>  cpu_intr(status, cause, pc, ipending)
>  	u_int32_t status;
> @@ -831,16 +833,32 @@ cpu_intr(status, cause, pc, ipending)
>  	u_int32_t pc;
>  	u_int32_t ipending;
>  {
> +#ifdef DIAGNOSTIC
> +	if (ipending != 0 && (ipending & ~servicing_mask) == 0)
> +	    panic("Re-entrancy in cpu_intr: pending %x, servicing %x\n",
> +			ipending, servicing_mask);
> +#endif
> +
>  	uvmexp.intrs++;
>  
>  	if (ipending & MIPS_HARD_INT_MASK)
>  		(*platform.iointr)(status, cause, pc, ipending);
>  
>  	/* software simulated interrupt */
> -	if ((ipending & MIPS_SOFT_INT_MASK_1)
> -		    || (ssir && (status & MIPS_SOFT_INT_MASK_1))) {
> +	if (servicing_mask == 0 &&
> +		((ipending & MIPS_SOFT_INT_MASK_1) 
> +		 || (ssir && (status & MIPS_SOFT_INT_MASK_1)))) {
> +
> +		splhigh();
> +		servicing_mask |= ipending;
> +		_splset(MIPS_SR_INT_IE |
> +			    (status & ~servicing_mask & MIPS_HARD_INT_MASK));
> +
>  		_clrsoftintr(MIPS_SOFT_INT_MASK_1);
>  		softintr_dispatch();
> +
> +		splhigh();
> +		servicing_mask &= ~ipending;
>  	}
>  }
>  
> ----
> Rafal Boni                                                     rafal@attbi.com
>   We are all worms.  But I do believe I am a glowworm.  -- Winston Churchill
>