Subject: Re: kern/33060: another vnlock deadlock
To: None <gnats-bugs@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: netbsd-bugs
Date: 03/14/2006 07:04:07
the problem is a deadlock between these two threads:

    0 23150 -1111682260   0  -6  0    200   0 fsslock  RW   ?      0:00.00 (dump)  ce1a4000

(gdb) pcb 0xce1a4000
(gdb) bt
#0  0xcf9eb974 in ?? ()
#1  0xc02362d3 in bpendtsleep () at ../../../../kern/kern_synch.c:508
#2  0xc020d554 in fss_ioctl (dev=41728, cmd=2148033640,
    data=0xce1a7c74 "\004", flag=1, p=0xd0ad21a8) at ../../../../dev/fss.c:278  
#3  0xc0277e17 in spec_open (v=0xce1a7ce4) at ../../../../sys/proc.h:388
#4  0xc02717e1 in vn_open (ndp=0xce1a7eb4, fmode=3, cmode=0)
    at ../../../../sys/vnode_if.h:212
#5  0xc026b43b in sys_open (l=0xcf9eb974, v=0xce1a7f64, retval=0xce1a7f5c)
    at ../../../../kern/vfs_syscalls.c:1159
#6  0xc02add87 in syscall_plain (frame=0xce1a7fa8)
    at ../../../../arch/i386/i386/syscall.c:161


    0  9461 -1111682260   0  -2  0    200   0 vnlock   RW   ?      0:00.00 (dump)  ce24c000

(gdb) pcb 0xce24c000
(gdb) bt
#0  0xcdfb7004 in ?? ()
#1  0xc02362d3 in bpendtsleep () at ../../../../kern/kern_synch.c:508
#2  0xc0226546 in acquire (lkpp=0xce24f7c4, s=0xce24f7ac, extflags=0, drain=0,  
    wanted=1536) at ../../../../kern/kern_lock.c:264
#3  0xc0226c2e in lockmgr (lkp=0xcfef5130, flags=65538, interlkp=0xcfef50c0)
    at ../../../../kern/kern_lock.c:785
#4  0xc0273c6f in genfs_lock (v=0xce24f7f4)
    at ../../../../miscfs/genfs/genfs_vnops.c:338
#5  0xc027265f in vn_lock (vp=0xcfef50c0, flags=65538)
    at ../../../../sys/vnode_if.h:1273
#6  0xc01e9fb7 in ffs_snapshot (mp=0xc28ec000, vp=0xd33c70bc, ctime=0xce24fb6c) 
    at ../../../../ufs/ffs/ffs_snapshot.c:381
#7  0xc020e282 in fss_create_files (sc=0xc03b1340, fss=0xce24fea4,
    bsize=0xce24fce4, p=0xceeca33c) at ../../../../dev/fss.c:599
#8  0xc020e355 in fss_create_snapshot (sc=0xc03b1340, fss=0xce24fea4,
    p=0xceeca33c) at ../../../../dev/fss.c:693
#9  0xc020d6c0 in fss_ioctl (dev=41728, cmd=2148288000,
    data=0xce24fea4 "\006\b", flag=0, p=0xceeca33c)
    at ../../../../dev/fss.c:294
#10 0xc02784ba in spec_ioctl (v=0xce24fd84)
    at ../../../../miscfs/specfs/spec_vnops.c:488
#11 0xc02723bb in vn_ioctl (fp=0xceed64e4, com=2148288000, data=0xce24fea4,
    p=0xceeca33c) at ../../../../sys/vnode_if.h:500
#12 0xc02476a2 in sys_ioctl (l=0xcdfb7004, v=0xce24ff64, retval=0xce24ff5c)
    at ../../../../kern/sys_generic.c:613
#13 0xc02add87 in syscall_plain (frame=0xce24ffa8)
    at ../../../../arch/i386/i386/syscall.c:161


(gdb) p *lkp
$1 = {lk_interlock = {lock_data = 0}, lk_flags = 525312, lk_sharecount = 0,
  lk_exclusivecount = 1, lk_recurselevel = 0, lk_waitcount = 2,
  lk_wmesg = 0xc0346683 "vnlock", lk_un = {lk_un_sleep = {
      lk_sleep_lockholder = 23150, lk_sleep_locklwp = 1, lk_sleep_prio = 20,
      lk_sleep_timo = 0, lk_newlock = 0x0}, lk_un_spin = {
      lk_spin_cpu = 23150}}}


the locking order has to be fss_lock -> vn_lock, so we need to avoid
sleeping for the fss lock when we're already holding a vnode lock.
the first thread doesn't actually need to take the fss lock since it's
just going to fail immediately anyway, so we just need to skip taking
the fss lock in fss_ioctl() for unknown ioctls.

-Chuck