Subject: Re: NMI intterupts
To: None <mcr@gateway.sandelman.ocunix.on.ca>
From: Gordon W. Ross <gwr@mc.com>
List: port-sun3
Date: 02/03/1996 22:00:24
> Date: Sat, 03 Feb 1996 18:30:39 -0500
> From: Michael Richardson <mcr@gateway.sandelman.ocunix.on.ca>
> Under a load (compiling mh, downloading tk today), my sun3 seems to
> die with an MNI interrupt received:
>
> login: nmi interrupt received
> Stopped at _Debugger+0x6: unlk a6
> db> trace
> _Debugger(e080214,ee5bfd8,e08031a,0,0) + 6
> _nmi_intr(0,0,2e87e,dffe9e0,dffe994) + 22
> _isr_autovec(7c) + 68
> __isr_autovec() + a
> db>
>
> At this point, if I continue, then the machine freezes, requiring a
> cold boot. (power switch... BREAK on ttya doesn't cut it)
>
> Ideas? What is NMI hooked up to?
NMI is caused by a memory error or clock (NMI clock is disabled),
so it must be a memory error. To find out precisely what caused it,
you need to look at the "memory error register" which is a location
in OBIO space. The register is mapped at the address shown in
prom_mappings[4] and you want two words at that address, i.e.:
db> x/xx prom_mappings
_prom_mappings: fe00000 fe02000
db>
_prom_mappings+0x8: fe04000 fe06000
db>
_prom_mappings+0x10: fe08000 fe0a000
db> x/xx 0xfe08000
0xfe08000: 50ffffff 6e07f888
db> c
The bits in that first byte (0x50) are:
/*
* Bits for the memory error register when used as parity error
register
*/
#define PER_INTR 0x80 /* r/o - 1 = parity interrupt pending */
#define PER_INTENA 0x40 /* r/w - 1 = enable interrupt on parity error */
#define PER_TEST 0x20 /* r/w - 1 = write inverse parity */
#define PER_CHECK 0x10 /* r/w - 1 = enable parity checking */
#define PER_ERR24 0x08 /* r/o - 1 = parity error <24..31> */
#define PER_ERR16 0x04 /* r/o - 1 = parity error <16..23> */
#define PER_ERR08 0x02 /* r/o - 1 = parity error <8..15> */
#define PER_ERR00 0x01 /* r/o - 1 = parity error <0..7> */
/*
* Bits for the memory error register when used as ECC error register
*/
#define EER_INTR 0x80 /* r/o - ECC memory interrupt pending */
#define EER_INTENA 0x40 /* r/w - enable interrupts on errors */
#define EER_BUSHOLD 0x20 /* r/w - hold memory bus mastership */
#define EER_CE_ENA 0x10 /* r/w - enable CE recording */
#define EER_TIMEOUT 0x08 /* r/o - Sirius bus time out */
#define EER_WBACKERR 0x04 /* r/o - write back error */
#define EER_UE 0x02 /* r/o - UE, uncorrectable error */
#define EER_CE 0x01 /* r/o - CE, correctable (single bit) error */
Someday, we need to make the nmi handler figure out what caused
the NMI and print out the memory error address, bit, etc.
Gordon