Mike Pumford wrote:
Well I've now done some further digging. If I go back to a kernel from the start of October (-D2011001 as a checkout date) with the patches attached from my previous email and the one toolchain change made after that (that stops as crashing) as well. Then I get a stable working system. Some other oddities that I put down to mismatches between the user land and kernel also disappear. A build with a checkout from 1st November is is broken as the head so a change made between 1/10 & 1/11 has broken things for me.For the last week I've been trying to get an up to date current kernel running on my acorn32 system which had been stably running 5.99.16 for the last year or so. I had to fix a couple of problems in the device drivers to get the kernel to boot at all but once I got to that stage I kept seeing panics and hangs after about 5-10mins (DIAGNOSTIC disabled) or even before getting fully multi-user (DIAGNOSTIC enabled). Here is a selection of the crashes from a non-DIAGNOSTIC kernel uvmfault (f0302204, f0319000, 2) -> e Fatal kernel mode data abort: 'Translation Fault (P)' trapframe 0xf301fd28 FSR=185050f7, FAR f1319f70, spsr a0000013 r0 =f1179000, r1 =00000002, r2 =00000035 r3 =f1319f6c r4 =0000007f r5 =f2764c04, r6 =f117904a r7 =f1179048 r8 =00010011, r9 =ffdc0000, r10=0000001b r11=f301fe20 r12=f1319fc6, ssp=f301fd74, slr=f00b7b1c pc =f00b6b64
The unexpected oddness I get with the newer kernel is that ntpdc was unable to talk to ntpd to report status. My suspicion is that there has been some unintended binary incompatibility created sometime in november that was causing the stability problems and network panics I'm seeing with kernels built after that date.
And yes I do have all the COMPAT_XX options in my kernel config. Mike