NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: [10.0_STABLE] Hard lock



Michael van Elst a écrit :
> joel.bertrand%systella.fr@localhost (=?ISO-8859-1?Q?BERTRAND_Jo=EBl?=) writes:
> 
>>> bouyer@antioche=2Eeu=2Eorg (Manuel Bouyer) writes:
>>>> This clearly means there is a locking problem here=2E And it may explain=
>> the
>>>> hard lock
> 
> So far I have never seen a LOCKDEBUG error, but there are at least two
> locking problems.
> 
> It's possible that a request completes before a timeout for it is even
> started. That caused the ccb_timeout without a (related) connection.
> 
> It's possible that a PDU is attempted to be freed twice (as part of
> sending out, and as part of timeout handling). The second free is
> ignored (causing the message about an unused PDU being freed),
> unless that PDU is already reused with possibly more severe
> consequences.

	OK. Your last patches against iscsi seem to fix early panic with
lockdebug option enabled. That being said, my server ran a lock debug
kernel last night and... panics again.

	Here is a copy of dmesg after kernel initialization :

[  3330.852084] ccb_timeout: num=1 total=0 disp=0 invalid
ccb=0xffff870021d33140
[ 15220.162147] {drm:netbsd:intel_pipe_update_start+0x33f} *ERROR*
Potential atomic update failure on pipe A: -35
[ 20505.361148] ccb_timeout: num=1 total=0 disp=0 invalid
ccb=0xffff870021d36848
[ 20775.901586] ccb_timeout: num=1 total=0 disp=0 invalid
ccb=0xffff870021d36fa0
[ 22449.264709] ccb_timeout: num=1 total=0 disp=0 invalid
ccb=0xffff870021d38b88
[ 23556.367217] {drm:netbsd:intel_pipe_update_start+0x33f} *ERROR*
Potential atomic update failure on pipe A: -35
[ 24082.508400] cpu4[3392 slave]: hogging kernel lock
[ 24082.508400] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68
[ 24082.508400] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99
[ 24082.508400] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79
[ 24082.508400] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18
[ 24082.508400] --- interrupt ---
[ 24082.508400] mutex_enter() at netbsd:mutex_enter+0x50d
[ 24082.508400] pool_get() at netbsd:pool_get+0x78
[ 24082.508400] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d
[ 24082.518403] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264
[ 24082.518403] m_get() at netbsd:m_get+0x37
[ 24082.518403] m_gethdr() at netbsd:m_gethdr+0x9
[ 24082.518403] tcp_output() at netbsd:tcp_output+0x135c
[ 24082.518403] tcp_rcvd_wrapper() at netbsd:tcp_rcvd_wrapper+0x67
[ 24082.518403] soreceive() at netbsd:soreceive+0x66a
[ 24082.518403] nfsrv_rcv() at netbsd:nfsrv_rcv+0x174
[ 24082.528401] do_nfssvc.part.0() at netbsd:do_nfssvc.part.0+0x138d
[ 24082.528401] syscall() at netbsd:syscall+0x196
[ 24082.528401] --- syscall (number 155) ---
[ 24082.528401] netbsd:syscall+0x196:
[ 24082.528401] cpu4[3392 slave]: hogging kernel lock
[ 24082.528401] Kernel lock error: _kernel_lock,266: spinout

[ 24082.528401] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68
[ 24082.528401] lock address : netbsd:kernel_lock
[ 24082.528401] type         : spin
[ 24082.528401] initialized  : netbsd:main+0x72
[ 24082.528401] shared holds :                  0 exclusive:
      1
[ 24082.528401] shares wanted:                  0 exclusive:
      5
[ 24082.528401] relevant cpu :                  2 last held:
      4
[ 24082.528401] relevant lwp : 0xffffaceebb7ad6c0 last held:
0xffffaceec299d0c0
[ 24082.528401] last locked* : netbsd:tcp_rcvd_wrapper+0x1a
[ 24082.528401] unlocked     : netbsd:ipintr+0x1e7
[ 24082.528401] curcpu holds :                  0 wanted by:
0xffffaceebb7ad6c0

[ 24082.528401] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,266:
spinout
[ 24082.528401] cpu2: Begin traceback...
[ 24082.528401] vpanic() at ipi_cpu_handler() at netbsd:vpanic+0x183
[ 24082.528401] netbsd:ipi_cpu_handler+0x99
[ 24082.528401] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79
[ 24082.528401] panic() at Xresume_lapic_ipi() at
netbsd:Xresume_lapic_ipi+0x18
[ 24082.528401] --- interrupt ---
[ 24082.528401] netbsd:panic+0x3c
[ 24082.528401] mutex_enter() at netbsd:mutex_enter+0x50d
[ 24082.528401] lockdebug_abort1() at pool_get() at netbsd:pool_get+0x78
[ 24082.528401] netbsd:lockdebug_abort1+0xe6
[ 24082.528401] pool_cache_get_slow() at _kernel_lock() at
netbsd:_kernel_lock+0x2a7
[ 24082.528401] netbsd:pool_cache_get_slow+0x13d
[ 24082.538399] pool_cache_get_paddr() at mb_drain() at
netbsd:pool_cache_get_paddr+0x264
[ 24082.538399] netbsd:mb_drain+0x17
[ 24082.538399] m_get() at netbsd:m_get+0x37
[ 24082.538399] pool_grow() at netbsd:pool_grow+0x3b9
[ 24082.538399] m_gethdr() at netbsd:m_gethdr+0x9
[ 24082.538399] pool_get() at netbsd:pool_get+0x3e5
[ 24082.538399] tcp_output() at netbsd:tcp_output+0x135c
[ 24082.538399] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d
[ 24082.538399] tcp_rcvd_wrapper() at netbsd:tcp_rcvd_wrapper+0x67
[ 24082.538399] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264
[ 24082.538399] soreceive() at netbsd:soreceive+0x66a
[ 24082.538399] m_get() at netbsd:m_get+0x37
[ 24082.538399] nfsrv_rcv() at netbsd:nfsrv_rcv+0x174
[ 24082.538399] m_gethdr() at netbsd:m_gethdr+0x9
[ 24082.538399] do_nfssvc.part.0() at netbsd:do_nfssvc.part.0+0x138d
[ 24082.538399] wm_add_rxbuf() at netbsd:wm_add_rxbuf+0x3a
[ 24082.538399] syscall() at netbsd:syscall+0x196
[ 24082.538399] --- syscall (number 155) ---
[ 24082.538399] wm_rxeof() at netbsd:syscall+0x196:
[ 24082.548399] cpu4[3392 slave]: hogging kernel lock
[ 24082.548399] netbsd:wm_rxeof+0x114
[ 24082.548399] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68
[ 24082.548399] wm_handle_queue() at netbsd:wm_handle_queue+0xff
[ 24082.548399] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99
[ 24082.548399] softint_dispatch() at netbsd:softint_dispatch+0x11c
[ 24082.548399] x86_ipi_handler() at DDB lost frame for
netbsd:x86_ipi_handler+0x79
[ 24082.548399] netbsd:Xsoftintr+0x4c, trying 0xffff87043770a0f0
[ 24082.548399] Xsoftintr() at netbsd:Xsoftintr+0x4c
[ 24082.548399] --- interrupt ---
[ 24082.548399] be9875aa43e74af1:
[ 24082.548399] cpu2: End traceback...
[ 24082.548399] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18
[ 24082.548399] --- interrupt ---

[ 24082.548399] dumping to dev 18,1 (offset=253015, size=4162677):
[ 24082.548399] dump mutex_enter() at netbsd:mutex_enter+0x50f
[ 24082.548399] pool_get() at netbsd:pool_get+0x78
[ 24082.548399] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d
[ 24082.548399] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264
[ 24082.548399] m_get() at netbsd:m_get+0x37
[ 24082.558399] m_gethdr() at netbsd:m_gethdr+0x9
[ 24082.558399] tcp_output() at netbsd:tcp_output+0x135c
[ 24082.558399] tcp_rcvd_wrapper() at netbsd:tcp_rcvd_wrapper+0x67
[ 24082.558399] soreceive() at netbsd:soreceive+0x66a
[ 24082.558399] nfsrv_rcv() at netbsd:nfsrv_rcv+0x174
[ 24082.558399] do_nfssvc.part.0() at netbsd:do_nfssvc.part.0+0x138d
[ 24082.558399] syscall() at netbsd:syscall+0x196
[ 24082.558399] --- syscall (number 155) ---
[ 24082.558399] netbsd:syscall+0x196:
[ 24082.558399] cpu4[3392 slave]: hogging kernel lock
[ 24082.558399] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68
[ 24082.558399] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99
[ 24082.568400] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79
[ 24082.568400] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18
[ 24082.568400] --- interrupt ---
[ 24082.568400] mutex_enter() at netbsd:mutex_enter+0x50d
[ 24082.568400] pool_get() at netbsd:pool_get+0x78
[ 24082.568400] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d
[ 24082.568400] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264
[ 24082.568400] m_get() at netbsd:m_get+0x37
[ 24082.568400] m_gethdr() at netbsd:m_gethdr+0x9
[ 24082.568400] tcp_output() at netbsd:tcp_output+0x135c
[ 24082.578401] tcp_rcvd_wrapper() at netbsd:tcp_rcvd_wrapper+0x67
[ 24082.578401] soreceive() at netbsd:soreceive+0x66a
[ 24082.578401] nfsrv_rcv() at netbsd:nfsrv_rcv+0x174
[ 24082.578401] do_nfssvc.part.0() at netbsd:do_nfssvc.part.0+0x138d
[ 24082.578401] syscall() at netbsd:syscall+0x196
[ 24082.578401] --- syscall (number 155) ---
[ 24082.578401] netbsd:syscall+0x196:
[ 24082.578401] cpu4[3392 slave]: hogging kernel lock
[ 24082.578401] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68
[ 24082.578401] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99
[ 24082.578401] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79
[ 24082.578401] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18
[ 24082.578401] --- interrupt ---
[ 24082.588400] mutex_enter() at netbsd:mutex_enter+0x50d
[ 24082.588400] pool_get() at netbsd:pool_get+0x78

	Regards,

	JB

Attachment: signature.asc
Description: OpenPGP digital signature



Home | Main Index | Thread Index | Old Index