Subject: Re: port-alpha/5546: port-alpha/lost a stack? exception_restore_regs bombs
To: None <mjacob@nas.nas.aogv>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: port-alpha
Date: 06/05/1998 12:23:51
On Fri, 5 Jun 1998 12:12:25 -0700 (PDT)
mjacob@nas.nasa.gov wrote:
> >Description:
> Running on a 128MB Alpha 8200, running a moderate disk exerciser,
> the system panic'ed (here's the extended printout, with the
> PAL logout area registers, perhaps Ross who knows PAL code
> better, could tell us in more detal what cooks-
> this printout code isn't checked in yet):
If the stack went away, you would get a "Kernel Stack Not Valid Halt"; no
machine check would occur. If the pointer was invalid, you'd get a
memory management fault.
What I suspect is happenening is that you're getting a memory error. In
this case, printing out the EV5 portion of the logout area is not sufficient.
You also need to print out the platform-specific (i.e. KN8AE) logout area,
which will have the ECC memory error information, etc.
(At least, it looks like you only have the EV5 portion... maybe I'm mistaken.)
>
> Processor Machine Check (670), Code 0x100000096
> PAL temp[0-1] = 0x0000000000000000 0x0000006164000000
> PAL temp[2-3] = 0xfffffc00003004d4 0x0000000000008680
> PAL temp[4-5] = 0xfffffe00003a375c 0x0000000000000006
> PAL temp[6-7] = 0x0000000000000001 0xfffffc00003003e8
> PAL temp[8-9] = 0x1f1e161514020100 0xfffffc0000300474
> PAL temp[10-11] = 0xfffffc0000300354 0xfffffc0000300418
> PAL temp[12-13] = 0xfffffc00003003b8 0x0000005555400000
> PAL temp[14-15] = 0x0000000000000000 0x00000000040385d9
> PAL temp[16-17] = 0x0000009806700801 0x0000000000000000
> PAL temp[18-19] = 0x00000001fffff418 0xfffffe00073c59d8
> PAL temp[20-21] = 0x0000000006778000 0xfffffc0000300444
> PAL temp[22-23] = 0xfffffc000053c1d0 0x000000000673a000
> shadow[0-1] = 0x0000000000000000 0x0000000000000000
> shadow[2-3] = 0x0000000000000000 0x0000000000000000
> shadow[4-5] = 0x0000000000000000 0x0000000000000000
> shadow[6-7] = 0x0000000000000000 0x0000000000000000
>
> Excepting Instruction Addr = 0xfffffc0000300354
> Summary of arithmetic traps = 0x0000000000000000
> Exception mask = 0x0000000000000000
> Base address for PALcode = 0x0000000000018000
> Interrupt Status Reg = 0x0000000000000000
> Current setup of EV5 IBOX = 0x0000006164000000
> I-CACHE Reg Data parity error = 0x0000000000000800
> D-CACHE error Reg = 0x0000000000000000
> Effective VA = 0xfffffe00003a3658
> Reason for D-stream = 0x0000000000014350
> EV5 SCache address = 0xffffff000001d28f
> EV5 SCache TAG/Data parity = 0x0000000000000000
> EV5 BC_TAG_ADDR = 0xffffff80010d6fff
> EV5 EI_ADDR Phys addr of Xfer = 0xffffff000011d6df
> Fill Syndrome = 0x0000000000009000
> ei_stat reg = 0xfffffff004ffffff
> ld_lock = 0xffffff0004b363df
>
> unexpected machine check:
>
> mces = 0x1
> vector = 0x670
> param = 0xfffffc0000008b10
> pc = 0xfffffc0000300354
> ra = 0xfffffc00003002e0
> curproc = 0xfffffe00003a3600
> pid = 342, comm = diskex
>
> panic: machine check
> syncing disks... 1 1 1 done
>
> The PC decodes as:
>
> (gdb) x/i 0xfffffc0000300354
> 0xfffffc0000300354 <exception_restore_regs>: ldq v0,0(sp)
>
>
> I'll retain the kernel and core dump if anyone wants to look at it.
> >How-To-Repeat:
>
> >Fix:
>
> >Audit-Trail:
> >Unformatted:
Jason R. Thorpe thorpej@nas.nasa.gov
NASA Ames Research Center Home: +1 408 866 1912
NAS: M/S 258-5 Work: +1 650 604 0935
Moffett Field, CA 94035 Pager: +1 650 428 6939