Subject: kern/3006: union_lookup possiblly goes into dead lock.
To: None <gnats-bugs@gnats.netbsd.org>
From: =?ISO-2022-JP?B?Ik1JTk9VUkEgTWFrb3RvIC8gGyRCTCcxOhsoQiAbJEI/PxsoQiI=?= <minoura@kw.netlaputa.or.jp>
List: netbsd-bugs
Date: 12/08/1996 17:31:23
>Number:         3006
>Category:       kern
>Synopsis:       union_lookup possiblly goes into dead lock.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Dec  8 04:05:00 1996
>Last-Modified:
>Originator:     Naofumi HONDA / MINOURA Makoto
>Organization:
NetBSD/pc98 Core Team
>Release:        Nov. 30, '96
>Environment:
System: NetBSD daisy 1.2B NetBSD 1.2B (DAISY) #1: Sun Dec 1 00:32:28 JST 1996 root@daisy:/usr/src/sys/arch/i386/compile/DAISY i386


>Description:
	union fs locking protocol is very very complicated and
	still has some posibility of dead lock. Here are examples
	we've found:-

	- union_lookup and union_root may try to lock two vnodes
	simultaneously.
	- While union_lookup is locking a directory, union_fsync
	for this directory may called, which causes deadlock.

>How-To-Repeat:
	Timing problem.... Execute find(1) on busy union filesystem or
	something like that...
>Fix:
	For ideal solution, we should rework the union locking protocol.
	The following is somewhat dirty quick hack written by Naofumi
	HONDA originally for NetBSD/pc98.

*** /sys/miscfs/union/union_vfsops.c.ORIG	Tue Oct 15 01:41:27 1996
--- /sys/miscfs/union/union_vfsops.c	Wed Dec  4 22:36:48 1996
***************
*** 373,378 ****
--- 373,396 ----
  	struct union_mount *um = MOUNTTOUNIONMOUNT(mp);
  	int error;
  	int loselock;
+ 	int lockadj = 0;
+ 
+ 	/* XXX:
+ 	 * Assume mount -t union /usr/obj /usr/src.
+ 	 * When process A looks up /usr/src,
+ 	 * then the union upper vnode is locked first.
+ 	 * Here, another process B calls union_root.
+ 	 * B locks the lower vnode and tries to lock the upper.
+ 	 * However the upper vnode is already locked by A.
+ 	 * Moreover, A tries to lock the lower vnode,
+ 	 * which is already locked by B!!
+ 	 */
+ 	if (um->um_lowervp && um->um_op != UNMNT_BELOW &&
+ 	    VOP_ISLOCKED(um->um_lowervp)) {
+ 		VREF(um->um_lowervp);
+ 		VOP_UNLOCK(um->um_lowervp);
+ 		lockadj = 1;
+ 	}
  
  	/*
  	 * Return locked reference to root.
***************
*** 404,409 ****
--- 422,432 ----
  	} else {
  		if (loselock)
  			VTOUNION(*vpp)->un_flags &= ~UN_ULOCK;
+ 	}
+ 
+ 	if (lockadj) {
+ 		VOP_LOCK(um->um_lowervp);
+ 		vrele(um->um_lowervp);
  	}
  
  	return (error);
*** union/union_vnops.c.ORIG	Tue Oct 15 01:42:03 1996
--- union_vnops.c	Fri Dec  6 22:26:08 1996
***************
*** 1024,1042 ****
  		struct proc *a_p;
  	} */ *ap = v;
  	int error = 0;
  	struct vnode *targetvp = OTHERVP(ap->a_vp);
  
  	if (targetvp != NULLVP) {
  		int dolock = (targetvp == LOWERVP(ap->a_vp));
  
  		if (dolock)
  			VOP_LOCK(targetvp);
! 		else
! 			FIXUP(VTOUNION(ap->a_vp));
  		error = VOP_FSYNC(targetvp, ap->a_cred,
  					ap->a_waitfor, ap->a_p);
  		if (dolock)
  			VOP_UNLOCK(targetvp);
  	}
  
  	return (error);
--- 1024,1064 ----
  		struct proc *a_p;
  	} */ *ap = v;
  	int error = 0;
+ 	int lockadj = 0;
+ 	struct union_node *un;
  	struct vnode *targetvp = OTHERVP(ap->a_vp);
  
  	if (targetvp != NULLVP) {
  		int dolock = (targetvp == LOWERVP(ap->a_vp));
  
+ 		un = VTOUNION(ap->a_vp);
  		if (dolock)
  			VOP_LOCK(targetvp);
! 		else if ((un->un_flags & UN_ULOCK) == 0 &&
! 			 VOP_ISLOCKED(targetvp) == 0) {
! 			/* XXX:
! 			 * To avoid this kind of deadlock, check if
! 			 * someone is locking.  Actually, we have to check
! 			 * who is locking...
! 			 * 1) union_lookup XXX. union_lookup calls ufs_lookup,
! 			 *   and locks XXX.
! 			 * 2) Here, imagine VOP_FSYNC is called from vinvalbuf.
! 			 * 3) union_fsync tries to lock the XXX vnode.
! 			 * Boo Boo!!!
! 			 */
! 			lockadj = 1;
! 			VOP_LOCK(targetvp);
! 			un->un_flags |= UN_ULOCK;
! 		}
! 
  		error = VOP_FSYNC(targetvp, ap->a_cred,
  					ap->a_waitfor, ap->a_p);
  		if (dolock)
  			VOP_UNLOCK(targetvp);
+ 		if (lockadj) {
+ 			VOP_UNLOCK(targetvp);
+ 			un->un_flags &= ~UN_ULOCK;
+ 		}
  	}
  
  	return (error);
>Audit-Trail:
>Unformatted: