Port-sparc64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Ultrasparc III+ kernel panic
On Wed, 1 Apr 2015, BERTRAND Joël wrote:
> Hello,
>
> New panic last night...
>
> 1 tt=30 tstate=4411001505 tpc=0x1001488 tnpc=0x100148c
> 2 tt=30 tstate=4482000603 tpc=0x12e1da0 tnpc=0x12e1da4
>
> Debug information :
> (gdb) list *(0x1001488)
> (gdb) x/i 0x1001488
> 0x1001488 <uspillk4+8>: sta %l0, [ %sp ] %asi
> (gdb) list *(0x12e1da0)
> 0x12e1da0 is in mutex_vector_enter (/usr/src/sys/kern/kern_mutex.c:440).
> 435 * fast-path stubs are available. If an mutex_spin_enter() stub
> is
> 436 * not available, then it is also aliased directly here.
> 437 */
> 438 void
> 439 mutex_vector_enter(kmutex_t *mtx)
> 440 {
> 441 uintptr_t owner, curthread;
> 442 turnstile_t *ts;
> 443 #ifdef MULTIPROCESSOR
> 444 u_int count;
> (gdb) x/i 0x12e1da0
> 0x12e1da0 <mutex_vector_enter>: save %sp, -176, %sp
>
> mach stack does not return usable information. Only :
> db{0} > mach stack
> Window 0 frame64 0xe004ff50 locals, ins:
> 10426baa0 0 15a068000 1044914d0 fffffffffefa2000 0 102cfafd0 180f680
> 0 0 0 0 0 0 ffffffffffffa011=sp fffffffffed6d200=pc:fffffffffed6d200
> Window 1 frame64 0xffffffffffffa810 locals, ins:
>
> You can see that this panic is exactly the same than last panic.
I looked at the archives and it doesn't look like I commented on this
previously.
I'm assuming the trap stack is semi-accurate. The save instruction should
not be able to generate a data access fault, but then the low level bits
of locore.s do some interesting gymnastics with the trap stack to prevent
loss of data, so it may have moved things around.
uspillk4 is used to save alternate space register windows to the stack.
The order of operations is:
1) The CPU is running userland code and traps into the kernel.
2) The kernel switches to the kernel stack and moves the contents of
%canrestore to %otherwin to indicate those register windows are not of the
current address space.
3) The kernel does some stuff and eventually calls mutex_vector_enter().
4) mutex_vector_enter() needs a new register window, so it does a save.
5) The register windows are full, so the CPU takes a store window trap.
Since %otherwin is not zero, it goes to uspillk4 to save other address
space windows instead of kspill4.
6) The trap handler tries to save the window and takes a data fault.
7) The data fault handler punts.
What should happen is:
The CPU takes a save fault at trap level 1.
It takes a data fault at trap level 2.
The data fault handler jumps to winfault. winfault will look at the
current trap level. Since it's not 1, it executes some fancy code to
fiddle with the trap stack and figure out what's really happening. It
should detect a fault during a spill and go to winfixspill.
winfixspill code should save all the otherwin windows to slots in the PCB,
and then continue executing kernel code.
Eventually, when returning to userland, the trap return code will restore
all the userland windows from the PCB and return to userland code.
winfix has a bunch of diagnostic code still enabled. You do not seem to
be hitting any of the sir instructions sprinkled in the code that would
reset the box.
There's still a lot of debug and diagnostic code in there. You might want
to try turning some of the NOT_DEBUG or NOTDEF_DEBUG code on.
Also, look for calls to panic. Line 2149 there's a ta 1, which will cause
a trap, before the call to panic. That made sense when the kernel still
had traptrace, since that would generate a traptrace entry before all hell
broke loose. Now it probably just makes things worse. Try removing it
to really call panic there, or changing it to an sir instruction to
generate a reset.
There's another ta 1 on line 2306 to trap to the debugger. Since trapping
to ddb is not reliable in this situation, change it to an sir instruction.
Anyway, you probably need to instrument that code path to see where it's
geting confused.
And keep in mind that code is semi-recursive in that you can take a
datafault trying to clean up state to take a data fault.
Eduardo
Home |
Main Index |
Thread Index |
Old Index