Subject: Re: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 12/07/2005 11:08:46
--Apple-Mail-5-1045703809
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Running with a kernel with DIAGNOSTIC, LOCKDEBUG and DEBUG turned on
produced two panics over the last week:
Nov 30
panic: kernel debugging assertion "(v == __SIMPLELOCK_LOCKED) || (v
== __SIMPLELOCK_UNLOCKED)" failed: file "/u1/netbsd-3.0/src/sys/arch/
x86/x86/lock_machdep.c",
Begin traceback...
__main(c07458f7,c07bbe60,53,c07bbe20,1) at netbsd:__main
__cpu_simple_lock(d0734268,c22ac800,1,286,c22ac800) at
netbsd:__cpu_simple_lock+0xd5
_simple_lock(d0734268,c07bd480,73b,c22ac800,d0734268) at
netbsd:_simple_lock+0x7a
pmap_reference(d0734268,c080207c,52c,297,282) at netbsd:pmap_reference
+0x1a
pmap_load(c03aa14f,cd042000,8062000,52c,cea3f29c) at netbsd:pmap_load
+0xc4
copyout(cd042000,52c,ce48bd14,282,1a000) at netbsd:copyout+0xf
ffs_read(ce48bcb4,cc317ae4,10001,20001,c063e660) at netbsd:ffs_read
+0x4a6
VOP_READ(cc317ae4,ce48bd14,1,cc300804,0) at netbsd:VOP_READ+0x34
vn_rdwr(0,cc317ae4,8062000,52c,1a000) at netbsd:vn_rdwr+0xb4
vmcmd_readvn(cf07caec,c2bc8a1c,bfc00000,0,0) at netbsd:vmcmd_readvn+0x2f
sys_execve(cea3f29c,ce48bf64,ce48bf5c,c08008a4,282) at
netbsd:sys_execve+0x620
syscall_plain() at netbsd:syscall_plain+0x1a5
--- syscall (number 59) ---
0xbdb2b13f:
End traceback...
syncing disks... 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 giving up
Printing vnodes for busy buffers
tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,
flags (0<VLOCKSWORK>)
tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
mode 060640, owner 0, group 5, size 0 not locked
tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,
flags (0<VLOCKSWORK>)
tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
mode 060640, owner 0, group 5, size 0 not locked
tag VT_UFS(1) type VDIR(2), usecount 0, writecount 0, refcount 1,
flags (0<VLOCKSWORK>)
tag VT_UFS, ino 43052, on dev 0, 4 flags 0x0, effnlink 3, nlink 3
mode 040755, owner 110, group 202, size 512 not locked
tag VT_UFS(1) type VDIR(2), usecount 0, writecount 0, refcount 1,
flags (0<VLOCKSWORK>)
tag VT_UFS, ino 129044, on dev 0, 4 flags 0x0, effnlink 2, nlink 2
mode 040700, owner 0, group 0, size 1536 not locked
tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,
flags (0<VLOCKSWORK>)
tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
mode 060640, owner 0, group 5, size 0 not locked
tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,
flags (0<VLOCKSWORK>)
tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
mode 060640, owner 0, group 5, size 0 not locked
tag VT_UFS(1) type VBLK(3), usecount 1873, writecount 0, refcount 62,
flags (0<VLOCKSWORK>)
tag VT_UFS, ino 249344, on dev 0, 0 flags 0x0, effnlink 1, nlink 1
mode 060640, owner 0, group 5, size 0 not locked
giving up
Dec 4: Note the second panic during the syncing disks phase of the
reboot after (during ?) the first panic. Trying to "call
simple_lock_dump" locks up the machine.
panic: kernel diagnostic assertion "vm_map_pmap(map) == pmap_kernel
()" failed: file "/u1/netbsd-3.0/src/sys/uvm/uvm_map.c", line 4151
Begin traceback...
__main(c073fd1d,c07b8e60,1037,c07b89e0,cc317b6c) at netbsd:__main
uvm_kmapent_alloc(d6b4c2a0,0,0,c0869ec0,0) at netbsd:uvm_kmapent_alloc
+0x30b
uvm_mapent_reserve(d6b4c2a0,cd3abd44,2,0,0) at
netbsd:uvm_mapent_reserve+0x54
uvm_unmap1(d6b4c2a0,0,bfc00000,0,c0869ec0) at netbsd:uvm_unmap1+0x1b
uvm_deallocate(d6b4c2a0,0,bfc00000,0,0) at netbsd:uvm_deallocate+0x32
sys_execve(d025ab7c,cd3abf64,cd3abf5c,c08008a4,c039ade7) at
netbsd:sys_execve+0xbd9
syscall_plain() at netbsd:syscall_plain+0x1a5
--- syscall (number 59) ---
0xbdb2b13f:
End traceback...
syncing disks... panic: kernel diagnostic assertion "pmap->pm_pdirpa
== rcr3()" failed: file "/u1/netbsd-3.0/src/sys/arch/i386/i386/pm
Begin traceback...
__main(c073fd1d,c07bd480,867,c0758ea8,c22ac800) at netbsd:__main
pmap_deactivate2(d025ab7c,ce7bd78c,0,0,c03accb9) at
netbsd:pmap_deactivate2+0x63
mpidle(d025ab7c,0,33e,c,0) at netbsd:mpidle+0x92
preempt(1,c07b4220,4e8,cd3ab994,c1788b10) at netbsd:preempt+0x75
genfs_putpages(cd3aba14,1312d00,0,0,c063f020) at netbsd:genfs_putpages
+0x7ec
VOP_PUTPAGES(cf4177b0,0,0,0,0) at netbsd:VOP_PUTPAGES+0x40u0:f
fsspi_fnoulutl_f
Stopped in pid 25812.1 (ps) at netbsd:cpu_Debugger+0x4: leave
db{0}> trace
cpu_Debugger(c0757a97,0,8,283,c08056e0) at netbsd:cpu_Debugger+0x4
__cpu_simple_lock(c0802694,989680,0,202,c086e1a8) at
netbsd:__cpu_simple_lock+0x93
_simple_lock(c0802694,c07afc40,2b7,c080e0a0,c2d4ca5c) at
netbsd:_simple_lock+0x7a
wakeup(c086e1a0,c07ba940,117,c2d4c9dc,c2d4c9e4) at netbsd:wakeup+0x55
uvm_aio_biodone(c2d4c9dc,c07b2f60,57c,282,c2372dd8) at
netbsd:uvm_aio_biodone+0x56
biodone(c2d4c9dc,0,ce957a18,297,c0845968) at netbsd:biodone+0x134
scsipi_complete(c310e038,c22c4000,ce957a58,246,c310e048) at
netbsd:scsipi_complete+0x159
scsipi_done(c310e038,2de,c074683a,8020,c0847a20) at netbsd:scsipi_done
+0x19a
isp_parse_async(c22c4000,8020,0,0,0) at netbsd:isp_parse_async+0x119
isp_intr(c22c4000,8,1,8020,c086a00c) at netbsd:isp_intr+0x1169
isp_pci_intr(c22c4000,10,10,c,0) at netbsd:isp_pci_intr+0x6b
intr_biglock_wrapper(c22e3f80,5,10,30,c0450010) at
netbsd:intr_biglock_wrapper+0x18
Xintr_ioapic_level5() at netbsd:Xintr_ioapic_level5+0xa0
--- interrupt ---
Xspllower(5,c07ad7c0,585,246,0) at netbsd:Xspllower+0xe
_kernel_lock(42,c060f000,cc77d440,c2d2da00,c22d8000) at
netbsd:_kernel_lock+0xfd
x86_softintlock(0,c0802694,4,ce957e68,c039ab79) at
netbsd:x86_softintlock+0xd
DDB lost frame for netbsd:Xsoftnet+0x18, trying 0xce957e2c
Xsoftnet() at netbsd:Xsoftnet+0x18
--- interrupt ---
0xce957e98:
db{0}> call simple_lock_dump
cpu0: spinout while in debugger
Here the machines locks up and needs a hard reset.
On Nov 26, 2005, at 18:08 , Manuel Bouyer wrote:
> On Sat, Nov 26, 2005 at 05:18:40PM -0500, Andreas Wrede wrote:
>>
>> On Nov 26, 2005, at 15:29 , Manuel Bouyer wrote:
>>
>>> On Fri, Nov 25, 2005 at 03:13:00AM +0000, Andreas Wrede wrote:
>>>>> Environment:
>>>>
>>>>
>>>> System: NetBSD whome.planix.com 3.0_RC3 NetBSD 3.0_RC3
>>>> (PLANIX.MPACPI) #0: Thu Nov 24 20:57:09 EST 2005
>>>> root@whome.planix.com:/u1/netbsd-3.0/src/sys/arch/i386/compile/
>>>> obj.i386/PLANIX.MPACPI i386
>>>> Architecture: i386
>>>> Machine: i386
>>>>> Description:
>>>> Over the last week I have experienced 3 kernel dead-locks on a
>>>> NetBSD 3.0_RC1/2/3 system.
>>>> The motherboard is a Tylan K8S Pro S2882G3NR with 2 AMD Opteron
>>>> 244 CPUs installed. The kernel
>>>> is differs from GENERIC.MPACPI in the value for some SYSVSEM
>>>> variables, maxusers and some
>>>> other variables.
>>>
>>> Can you try a kernel with DIAGNOSTIC, DEBUG and LOCKDEBUG ?
>>
>> Right now, I am running with LOCKDEBUG. I will add DIAGNOSTIC and
>> DEBUG.
>
> Yes, if you have the problem I'm thinking about, it will only be
> detected if you have DIAGNOSTIC. But LOCKDEBUG and DEBUG can't hurt,
> maybe these will catch something else.
>
>>
>> Not knowing much about kernel debugging, and since creating a core
>> dump is not possible,
>
> Why ? Have you tried reboot(0x104) ?
>
>> what commands should I run the next time the
>> dead-lock occurs?
>
> I can't see at anything more than what you have provided for now ...
>
> --
> Manuel Bouyer <bouyer@antioche.eu.org>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
>
--
aew
--Apple-Mail-5-1045703809
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)
iD8DBQFDlwkUEh/h9J/TQyERAiMCAKDIlC4Rh6YfQm5Jb7n3fic/CiJJmwCffe3H
BOCIjKQUXcWbr2eiFpO3A2g=
=5yXZ
-----END PGP SIGNATURE-----
--Apple-Mail-5-1045703809--