Michael van Elst a écrit : > joel.bertrand%systella.fr@localhost (=?ISO-8859-1?Q?BERTRAND_Jo=EBl?=) writes: > >>> bouyer@antioche=2Eeu=2Eorg (Manuel Bouyer) writes: >>>> This clearly means there is a locking problem here=2E And it may explain= >> the >>>> hard lock > > So far I have never seen a LOCKDEBUG error, but there are at least two > locking problems. > > It's possible that a request completes before a timeout for it is even > started. That caused the ccb_timeout without a (related) connection. > > It's possible that a PDU is attempted to be freed twice (as part of > sending out, and as part of timeout handling). The second free is > ignored (causing the message about an unused PDU being freed), > unless that PDU is already reused with possibly more severe > consequences. OK. Your last patches against iscsi seem to fix early panic with lockdebug option enabled. That being said, my server ran a lock debug kernel last night and... panics again. Here is a copy of dmesg after kernel initialization : [ 3330.852084] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff870021d33140 [ 15220.162147] {drm:netbsd:intel_pipe_update_start+0x33f} *ERROR* Potential atomic update failure on pipe A: -35 [ 20505.361148] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff870021d36848 [ 20775.901586] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff870021d36fa0 [ 22449.264709] ccb_timeout: num=1 total=0 disp=0 invalid ccb=0xffff870021d38b88 [ 23556.367217] {drm:netbsd:intel_pipe_update_start+0x33f} *ERROR* Potential atomic update failure on pipe A: -35 [ 24082.508400] cpu4[3392 slave]: hogging kernel lock [ 24082.508400] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68 [ 24082.508400] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99 [ 24082.508400] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79 [ 24082.508400] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18 [ 24082.508400] --- interrupt --- [ 24082.508400] mutex_enter() at netbsd:mutex_enter+0x50d [ 24082.508400] pool_get() at netbsd:pool_get+0x78 [ 24082.508400] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d [ 24082.518403] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264 [ 24082.518403] m_get() at netbsd:m_get+0x37 [ 24082.518403] m_gethdr() at netbsd:m_gethdr+0x9 [ 24082.518403] tcp_output() at netbsd:tcp_output+0x135c [ 24082.518403] tcp_rcvd_wrapper() at netbsd:tcp_rcvd_wrapper+0x67 [ 24082.518403] soreceive() at netbsd:soreceive+0x66a [ 24082.518403] nfsrv_rcv() at netbsd:nfsrv_rcv+0x174 [ 24082.528401] do_nfssvc.part.0() at netbsd:do_nfssvc.part.0+0x138d [ 24082.528401] syscall() at netbsd:syscall+0x196 [ 24082.528401] --- syscall (number 155) --- [ 24082.528401] netbsd:syscall+0x196: [ 24082.528401] cpu4[3392 slave]: hogging kernel lock [ 24082.528401] Kernel lock error: _kernel_lock,266: spinout [ 24082.528401] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68 [ 24082.528401] lock address : netbsd:kernel_lock [ 24082.528401] type : spin [ 24082.528401] initialized : netbsd:main+0x72 [ 24082.528401] shared holds : 0 exclusive: 1 [ 24082.528401] shares wanted: 0 exclusive: 5 [ 24082.528401] relevant cpu : 2 last held: 4 [ 24082.528401] relevant lwp : 0xffffaceebb7ad6c0 last held: 0xffffaceec299d0c0 [ 24082.528401] last locked* : netbsd:tcp_rcvd_wrapper+0x1a [ 24082.528401] unlocked : netbsd:ipintr+0x1e7 [ 24082.528401] curcpu holds : 0 wanted by: 0xffffaceebb7ad6c0 [ 24082.528401] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,266: spinout [ 24082.528401] cpu2: Begin traceback... [ 24082.528401] vpanic() at ipi_cpu_handler() at netbsd:vpanic+0x183 [ 24082.528401] netbsd:ipi_cpu_handler+0x99 [ 24082.528401] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79 [ 24082.528401] panic() at Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18 [ 24082.528401] --- interrupt --- [ 24082.528401] netbsd:panic+0x3c [ 24082.528401] mutex_enter() at netbsd:mutex_enter+0x50d [ 24082.528401] lockdebug_abort1() at pool_get() at netbsd:pool_get+0x78 [ 24082.528401] netbsd:lockdebug_abort1+0xe6 [ 24082.528401] pool_cache_get_slow() at _kernel_lock() at netbsd:_kernel_lock+0x2a7 [ 24082.528401] netbsd:pool_cache_get_slow+0x13d [ 24082.538399] pool_cache_get_paddr() at mb_drain() at netbsd:pool_cache_get_paddr+0x264 [ 24082.538399] netbsd:mb_drain+0x17 [ 24082.538399] m_get() at netbsd:m_get+0x37 [ 24082.538399] pool_grow() at netbsd:pool_grow+0x3b9 [ 24082.538399] m_gethdr() at netbsd:m_gethdr+0x9 [ 24082.538399] pool_get() at netbsd:pool_get+0x3e5 [ 24082.538399] tcp_output() at netbsd:tcp_output+0x135c [ 24082.538399] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d [ 24082.538399] tcp_rcvd_wrapper() at netbsd:tcp_rcvd_wrapper+0x67 [ 24082.538399] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264 [ 24082.538399] soreceive() at netbsd:soreceive+0x66a [ 24082.538399] m_get() at netbsd:m_get+0x37 [ 24082.538399] nfsrv_rcv() at netbsd:nfsrv_rcv+0x174 [ 24082.538399] m_gethdr() at netbsd:m_gethdr+0x9 [ 24082.538399] do_nfssvc.part.0() at netbsd:do_nfssvc.part.0+0x138d [ 24082.538399] wm_add_rxbuf() at netbsd:wm_add_rxbuf+0x3a [ 24082.538399] syscall() at netbsd:syscall+0x196 [ 24082.538399] --- syscall (number 155) --- [ 24082.538399] wm_rxeof() at netbsd:syscall+0x196: [ 24082.548399] cpu4[3392 slave]: hogging kernel lock [ 24082.548399] netbsd:wm_rxeof+0x114 [ 24082.548399] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68 [ 24082.548399] wm_handle_queue() at netbsd:wm_handle_queue+0xff [ 24082.548399] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99 [ 24082.548399] softint_dispatch() at netbsd:softint_dispatch+0x11c [ 24082.548399] x86_ipi_handler() at DDB lost frame for netbsd:x86_ipi_handler+0x79 [ 24082.548399] netbsd:Xsoftintr+0x4c, trying 0xffff87043770a0f0 [ 24082.548399] Xsoftintr() at netbsd:Xsoftintr+0x4c [ 24082.548399] --- interrupt --- [ 24082.548399] be9875aa43e74af1: [ 24082.548399] cpu2: End traceback... [ 24082.548399] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18 [ 24082.548399] --- interrupt --- [ 24082.548399] dumping to dev 18,1 (offset=253015, size=4162677): [ 24082.548399] dump mutex_enter() at netbsd:mutex_enter+0x50f [ 24082.548399] pool_get() at netbsd:pool_get+0x78 [ 24082.548399] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d [ 24082.548399] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264 [ 24082.548399] m_get() at netbsd:m_get+0x37 [ 24082.558399] m_gethdr() at netbsd:m_gethdr+0x9 [ 24082.558399] tcp_output() at netbsd:tcp_output+0x135c [ 24082.558399] tcp_rcvd_wrapper() at netbsd:tcp_rcvd_wrapper+0x67 [ 24082.558399] soreceive() at netbsd:soreceive+0x66a [ 24082.558399] nfsrv_rcv() at netbsd:nfsrv_rcv+0x174 [ 24082.558399] do_nfssvc.part.0() at netbsd:do_nfssvc.part.0+0x138d [ 24082.558399] syscall() at netbsd:syscall+0x196 [ 24082.558399] --- syscall (number 155) --- [ 24082.558399] netbsd:syscall+0x196: [ 24082.558399] cpu4[3392 slave]: hogging kernel lock [ 24082.558399] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68 [ 24082.558399] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99 [ 24082.568400] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79 [ 24082.568400] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18 [ 24082.568400] --- interrupt --- [ 24082.568400] mutex_enter() at netbsd:mutex_enter+0x50d [ 24082.568400] pool_get() at netbsd:pool_get+0x78 [ 24082.568400] pool_cache_get_slow() at netbsd:pool_cache_get_slow+0x13d [ 24082.568400] pool_cache_get_paddr() at netbsd:pool_cache_get_paddr+0x264 [ 24082.568400] m_get() at netbsd:m_get+0x37 [ 24082.568400] m_gethdr() at netbsd:m_gethdr+0x9 [ 24082.568400] tcp_output() at netbsd:tcp_output+0x135c [ 24082.578401] tcp_rcvd_wrapper() at netbsd:tcp_rcvd_wrapper+0x67 [ 24082.578401] soreceive() at netbsd:soreceive+0x66a [ 24082.578401] nfsrv_rcv() at netbsd:nfsrv_rcv+0x174 [ 24082.578401] do_nfssvc.part.0() at netbsd:do_nfssvc.part.0+0x138d [ 24082.578401] syscall() at netbsd:syscall+0x196 [ 24082.578401] --- syscall (number 155) --- [ 24082.578401] netbsd:syscall+0x196: [ 24082.578401] cpu4[3392 slave]: hogging kernel lock [ 24082.578401] ipi_msg_cpu_handler() at netbsd:ipi_msg_cpu_handler+0x68 [ 24082.578401] ipi_cpu_handler() at netbsd:ipi_cpu_handler+0x99 [ 24082.578401] x86_ipi_handler() at netbsd:x86_ipi_handler+0x79 [ 24082.578401] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x18 [ 24082.578401] --- interrupt --- [ 24082.588400] mutex_enter() at netbsd:mutex_enter+0x50d [ 24082.588400] pool_get() at netbsd:pool_get+0x78 Regards, JB
Attachment:
signature.asc
Description: OpenPGP digital signature