Subject: Re: kernel profiling broken on mips?
To: Castor Fu <castor@geocast.net>
From: Jonathan Stone <jonathan@DSG.Stanford.EDU>
List: port-mips
Date: 02/21/1999 21:06:17
I've fixed a couple of locore routines that need to be non-profiled,
but weren't. To make sure there arent any more, I've built locore*.o
with profiling turned off in all the tests below.
If I handcraft _mcount() (via the MCOUNT macron in <mips/profile.h>,
which gets expanded in sys/lib/libkern/mcount.c) to not write to the
stack space, then I get kernels where PC-sampling works.
To try and debug this, I've deleted the actual call to __mcount() and
changed the code that saved a0-a3 to just write a couple of zeros:
#define MCOUNT \
__asm__(".globl _mcount;" \
".type _mcount,@function;" \
"_mcount:;" \
".set noreorder;" \
".set noat;" \
"subu $29,$29,16;" \
"nop;" \
"sw $0, 4($29);" \
"sw $0, 8($29);" \
"nop;" \
"addu $29,$29,24;" \
"j $31;" \
"move $31,$1;" \
".set reorder;" \
".set at");
But if _mcount() even just tries and write zeros into the stack-space
it allocates, I get a kernel panic inside tsleep() right after
mi_switch() returns. Here's a stacktrace:
status=0x2004ff02, cause=0x8, epc=0x8005f9d4, vaddr=0x105
pid=0 cmd=swapper usp=0x0 ksp=0x801e4e40
Stopped in swapper at _bpendtsleep: lbu v1,261(s0)
db> trace
_tsleep+1b4 (ff00,36d0d740,801adbb0,36d0d740) ra 80130b48 sz 48
_uvm_scheduler+bc (ff00,36d0d740,801adbb0,36d0d740) ra 8004ddf8 sz 24
_main+6a4 (ff00,36d0d740,801adbb0,36d0d740) ra 80de7290 sz 80
And, indeed, s0 (where the local variable |p| is allocated) is zero.
Something in the context-switch code is breaking when it interacts
with the profiling code.
This only seems to happen shortly after a user process exits.
My best guess is, something in the context-switch code is breaking,
but I have no idea what.
I've tried allocating extra space in mcount(), and putting its state
at the top or bottom of the over-allocated frame; neither one seemed
seemed to fix the problem. Anyone got a clue? Nisimura-san?