tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: tstile syndrome
On Thu, Aug 27, 2009 at 01:09:16PM +0200, Manuel Bouyer wrote:
> Hi,
> here's what I found so far on a server that show the tstile hang,
> with some ddb+gdb playing.
>
> Most processes are waiting on a tunrstile (you did know that),
> the one I started with had more than 4000 writers in the queue.
> The threads did come here though a VOP_LOCK() (you did also know that).
> This is a tunrstile for a rwlock, I found the owner of this rwlock.
> This thread is also waiting on a turnstile, but a different one,
> it also did come here though a VOP_LOCK. This is also a turnstile for a
> rwlock, which also has a owner, which also has VOP_LOCK in his stack
> trace and is waiting on a turnstile. It's also a rwlock (I checked the
> l_syncobj) but l_wchan is bogus: ffff800079ac402f, this is not a
> valid krwlock_t* (and examining memory at this address doesn't look like
> a valid krwlock_t value, and 'show lock' doens't know about it either).
I think I mixed up pointer and values at one point.
I got another instance of the tstile deadlock and I think I found the
cause:
ffff800079987800 wchan_t 0xffff80008e572958 syncobj 0xffffffff806cf280 rw
owner 0xffff8000d47a3baf
ffff8000d47a3ba0 wchan_t 0xffff80008f301290 syncobj 0xffffffff806cf280 rw
owner 0xffff80007998780f
So ffff800079987800 is waiting on a lock held by 0xffff8000d47a3ba0, and
ffff8000d47a3ba0 is waiting on a lock held by 0xffff800079987800.
here's the stack trace for both processes:
db{0}> tr/a ffff800079987800
trace: pid 21115 lid 1 at 0xffff80007931c710
sleepq_block() at netbsd:sleepq_block+0xec
turnstile_block() at netbsd:turnstile_block+0x29e
rw_vector_enter() at netbsd:rw_vector_enter+0x28c
vlockmgr() at netbsd:vlockmgr+0xf6
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd9
wapbl_ufs_rename() at netbsd:wapbl_ufs_rename+0x5ab
ufs_rename() at netbsd:ufs_rename+0x39
VOP_RENAME() at netbsd:VOP_RENAME+0x75
do_sys_rename() at netbsd:do_sys_rename+0x57d
syscall() at netbsd:syscall+0xb6
db{0}> tr/a ffff8000d47a3ba0
trace: pid 25624 lid 1 at 0xffff8000d47cb650
sleepq_block() at netbsd:sleepq_block+0xec
turnstile_block() at netbsd:turnstile_block+0x29e
rw_vector_enter() at netbsd:rw_vector_enter+0x28c
vlockmgr() at netbsd:vlockmgr+0xf6
VOP_LOCK() at netbsd:VOP_LOCK+0x64
vn_lock() at netbsd:vn_lock+0xd9
cache_lookup() at netbsd:cache_lookup+0x201
ufs_lookup() at netbsd:ufs_lookup+0xcd
VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x80
lookup() at netbsd:lookup+0x34b
namei() at netbsd:namei+0x1a4
do_sys_stat() at netbsd:do_sys_stat+0x44
sys___lstat30() at netbsd:sys___lstat30+0x2a
syscall() at netbsd:syscall+0xb6
Any idea on how to fix this ?
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index