Subject: Re: mutex fault
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Kazushi (Jam) Marukawa <jam@pobox.com>
List: port-xen
Date: 11/26/2007 20:24:45
Hi Manuel,
Thanks for checking codes. As you requested, I ran the new
kernel after compiling with -g option (call it netbsd1122g)
and got crash trase of it. Unfortunately, it seems the
same. The address is also the same, callout_softclock+0x24d.
Disasseble of callout_softclock shows following (0x24d=589):
0xc043151a <callout_softclock+586>: call *0xffffffec(%ebp)
0xc043151d <callout_softclock+589>: movl $0xc098d594,(%esp)
0xc0431524 <callout_softclock+596>: call 0xc04db580 <mutex_spin_enter>
Line numbers are:
callout_softclock+586: kern/kern_timeout.c 604
callout_softclock+589: kern/kern_timeout.c 605
callout_softclock+596: kern/kern_timeout.c 605
And here is the source code.
602 KERNEL_UNLOCK_ONE(curlwp);
603 } else
604 (*func)(arg);
605 mutex_spin_enter(&callout_lock);
606
607 /*
608 * We can't touch 'c' here because it might be
609 * freed already. If LWPs waiting for callout
> it seems it called mutex_spin_enter instead
> of mutex_spin_exit.
The return address points the address where the program goes
back, so it points the next address of what it was executing
in the previous stack frame.
So, I guess "func" has a pointer to the mutex_vector_exit
function. And kernel called it at line 604 above. Then,
the kernel crashed inside the function pointed by "func
(=mutex_vector_exit())".
I'm not sure why or how "func" gets pointer to the
mutex_vector_exit, though. Maybe several mutex calls were
overlapped?
Anyway, here is a new stack trace from the kernel compiled
with "-g". It's almost the same as the previous one.
----
Mutex error: mutex_vector_exit: exiting unheld spin mutex
lock address : 0x00000000c098dde0
current cpu : 0
current lwp : 0x00000000ca7a5e00
owner field : 0x0000000000000b00 wait/spin: 0/1
panic: lock error
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c04ca539 cs 9 eflags 246 cr2 0 ilevel b
Stopped in pid 0.2 (system) at netbsd:breakpoint+0x1: ret
db> trace
breakpoint(c0879e8d,c0876aa5,c0701570,c0876a41,c098dde0) at netbsd:breakpoint+0x1
lockdebug_abort(c098dde0,c08e72ac,c0701570,c0876a41,cd48826c) at netbsd:lockdebug_abort+0x61
mutex_abort(c098dde0,c0701570,c0876a41,c04db1d7,cd48826c) at netbsd:mutex_abort+0x34
mutex_vector_exit(c098dde0,ca003b9a,cafb6f78,c04db275,c092a560) at netbsd:mutex_vector_exit+0xcf
callout_softclock(0,0,cafb6fb8,c04dbf26,0) at netbsd:callout_softclock+0x24d
softintr_dispatch(0,cafb6fa8,c0ae000f,f,fffff000) at netbsd:softintr_dispatch+0x3f
DDB lost frame for netbsd:Xsoftclock+0x3c, trying 0xcafb6fa0
Xsoftclock() at netbsd:Xsoftclock+0x3c
--- interrupt ---
--- switch to interrupt stack ---
?(31,ca000011,ca780011,0,35f3) at 0x1
On Nov 26, 0:04, Manuel Bouyer wrote:
> Subject: Re: mutex fault
>
> Thanks. Can you reproduce the panic with this kernel (in case it's not
> at callout_softclock+0x24d any more), and also map the address to a
> line number in C code ? it's
> l *(callout_softclock+0x24d)
> in gdb (easier than doing dissassembly by hand :)
>
> From this it's the same as I get, but it's not consistent with the
> backtrace you posted earlier ... it seems it called mutex_spin_enter instead
> of mutex_spin_exit.
>
> --
> Manuel Bouyer <bouyer@antioche.eu.org>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
>-- End of excerpt from Manuel Bouyer
-- Kazushi
Last week a cop stopped me in my car. He asked me if I had a police
record. I said, no, but I have the new DEVO album. Cops have no sense
of humor.