Subject: Re: arm32 kernel crashes
To: None <port-arm32@netbsd.org>
From: David Forbes <dmf20@hermes.cam.ac.uk>
List: port-arm32
Date: 12/15/1998 14:26:17
I've now rebuilt a kernel on my CATS box with Charles Hannum's modified
debugger, and things ran smoothly until activity on the serial port caused
the crash again. I've noticed that the point at which the crash occurs
(in number of characters exchanged on tty01), but the point in the code is
always the same.
Fault with intr_depth > 0
Data abort: 'Translation fault (page)' status=007 address=effffffc
PC=f0116a6c
Stopped in bash at irq_entry+0x88: ldr r2, [r7, r9, lsl#2]
In this particular case, a login was achieved and bash started. But not
for long.
db> tr
_comstart
_ttstart
_ttwrite
_comwrite
_spec_write
_ufsspec_write
_vn_write
_dofilewrite
_sys_write
_syscall
This is as before. (I've omittd the (_symbol +0x10) because they were all
the same.)
db> show registers
spsr 0x40000093
r0 0
r1 _intr_disabled_mask
r2 0xe28ff441
r3 0x80000013
r4 0x1
r5 0xf1152000
r6 0xf114cb00
r7 0xf0180248 (_spl_masks)
r8 0
r9 0xfff9ff6d
r10 0xf4000000
r11 0xf37v9d5c
r12 0x1
usr_sp 0xefbfd394
usr_lr 0x200f1eb4
svc_sp 0xf37b9cdc
svc_lr _splx + 0x30
pc irq_entry + 0x88
und_sp 0xf37b8ff0
abt_sp 0xf01bc000
irq_sp 0xf01bb000
Looking at these values, I'm not surprised that ldr r2, [r7, r9, lsl#2]
failed. Anyway, attempting to continue, just repeats the original Data
abort error as many times as you like.
However, attempting to reboot:
db> reboot
boot: howto = 00000000 curproc = 0xf3787600
Warning IRQs disabled during boot()
syncing disks...22 21 10 done
Fault with intr_depth > 0
Data abort: 'Translation fault (page)' status = 007 address = 2004b330
PC=f0110df0
Stopped in updateat _fetchuserword+0x30: ldr r0,[r0, #0x0000]
I'm assuming that this is related to the previous fault, so I haven't
noted the register values, etc. Issuing another reboot does so instantly,
and the machine comes back up with wd0a not marked clean, but wd0e is.
The code in irq_entry that causes the original fault is in
footbridge/footbridge_irq.S, in the section concerned with finding the
highest IPL.
mov r9, #(_SPL_LEVELS - 1)
ldr r7, Lspl_masks
Lfind_highest_ipl:
ldr r2, [r7, r9, lsl #2] * Fault here
tst r8, r2
subeq r9, r9, #1
beq Lfind_highest_ipl
Now, according to the register dump, r8 is zero. Therefore TST r8, r2
will always set the Z flag and EQ will always be true? Therefore, we keep
subtracting from r9 until we get a fault. According to the code r8 should
be the current IRQ requests.
I'm presuming that because everything else appears to function normally
until an error occurs, that perhaps this code is not directly to blame?
Cheers,
David.
PS - I've rudely assumed in the above that db accounts for the pipeline
and the fault being given by and instruction further back in the code than
the one I've looked at. In retrospect, this seems a rather dim
assumption...