Subject: Re: 1.6_BETA3 /cats crashing
To: None <david@flossy.u-net.com>
From: Chris Gilbert <chris@paradox.demon.co.uk>
List: port-cats
Date: 07/07/2002 00:11:32
On Sun, 2002-07-07 at 00:37, david@flossy.u-net.com wrote:
> Chris,
>
> Ive tried a few things to try and understand the problem with my machine:
>
> 1) Leaving it to run quietly for a while. No crash... implies that the problem is not a gradual memory leak leading to a crash.
>
> 2) Heavy internet usage. No crash... implies that its not purely related to the ADSL modem I have on my USB port.
>
> 3) Heavy disk usage, with ADSL modem disconnected. "Runs" all night with load ave > 9 with no crash. Implies that its not just disk activity, not surprising, giving the trace suggested the USB was involved.
>
> 4) Heavy disk usage, with ADSL modem connected, but no internet activity Crashed fairly promptly.
>
> Ive also managed to catch another trace, and this time I recorded the rlv values:
>
> Non-emulated page fault with intr_depth>0
> Data abort: 'Translation fault (page)' status=007 address=f73279f8 PC=f0219ea0
> Stopped in pid 9316 (gzcat) at memmove+0x310: strb r3,[r0, -#0x0001]!
>
> usb_transfer_complete +0xc
> rlv = 0xf001a4c4 (ohci_softintr +0xdc)
> ohci_softintr +0xc
> rlv = 0xf00a4168 (softclock +0x1b8)
> softclock +0xc
> rlv = 0xf01ae384 (dosoftints +0x74)
> dosoftints +0xc
> rlv = 0xf0197e54 (exitirq +0x30)
>
> I cant imagine that the nature of the USB device attached can cause this behaviour, so Im wondering who else is using USB devices on a CATS machine successfully?
>
> However, Im now slightly at a loss as to where to go next. My system is practically unusable under 1.6 compared, and its been over three years since I last messed with the kernel. I dont seem to be able to get a core dump by the time the crash occurs, the system is so hosed, it cant even sync disks. Any suggestions?
Could you try using:
wd* at pciide? channel ? drive ? flags 0x0fac
which might slow the disks down as it disable UDMA but we've seen issues
with UDMA and 2 disks, I wondering if it's just a generic IRQ's are
messed up.
Certainly I'm starting to look at the cats irq handling, and have been
trying to think of how to do generic soft interrupts (other archs have
<32 interrupt bits, we've actually got a full 32 interrupts on
footbridge)
I might try rigging up my USB speakers and do disk stuff see if I can
provoke a crash.
Chris