Subject: [solution] vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup()
To: None <tech-kern@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-kern
Date: 03/16/2003 20:23:01
I think I have found the problem, but I'm not 100% sure since the
crashes were not repeatable by a fixed number of operations. It used
to crash after 3-5 'modify buffer, ^X u' operations, and now I've gone
20 or so. (duh this is emacs so off goes a keyboard macro to do '
^X^S' 200 times, causing massive coda lock printouts
(coda_lockdebug=1). After, repeating with ' ^Xu' 200 times, I'm
pretty convinced that at the very least the code is closer to correct
In coda/coda_vnops.c:coda_symlink, vput is called before lookup on the
parent vnode. This seemed wrong to me - dropping a reference and then
using the vnode. I took out the vput and got 'locking against
myself'. After reading the code more, I came to the conclusion:
lookup expects a referenced vnode, but not one that's locked
So, I replaced vput with VOP_UNLOCK, so that the vn_lock in lookup
would be happy, but without dropping the reference. I think this lost
reference was casuing the vnode to get reclaimed at a surprising time,
causing the VBAD symptom.
I'd appreciate comments from those who are more vfs-clueful than I am
- is my interpretation of lookup's calling conventions and the
referencing rules correct?
Here is a patch against netbsd-1-6 as of 20030206 (from my own repo).
The first and third hunks are attempts to notice the VBAD problem
earlier rather than later; the second hunk is the fix - just dropping
vput in favor of VOP_UNLOCK.
I'm happy to send a clean patch and file a PR if that's helpful.
An aside: I suppose that since lookup(9) and the comments in the
sources don't say that the vnode is to be locked on entry, therefore
the caller should hold a reference and the vnode should be unlocked.
This is arguably implied by vnode(9), but it took me a long time to
figure this out. The notion of holding a reference is pretty
straightforward, but the notion that one passes that reference into a
function for functions that don't "return" their directory arguments
was a bit confusing - some explicit language about when functions
consume references and that returned values are ref'd might be
helpful. Now that I understand it it seems obvious, so it could just
be me being confused.
Thanks to Bill and Jaromir for helpful hints in understanding vnodes!
--- coda_vnops.c.~1.1.1.2.~ Wed Dec 5 23:27:40 2001
+++ coda_vnops.c Sun Mar 16 19:50:58 2003
@@ -1640,8 +1640,15 @@
error = venus_symlink(vtomi(tdvp), &tdcp->c_fid, path, plen, nm, len, tva, cred, p);
/* Invalidate the parent's attr cache, the modification time has changed */
+ /* XXX what does this do to the vnode? */
tdcp->c_flags &= ~C_VATTR;
+ if (tdvp->v_type == VBAD) {
+ error = ENODEV; /* not really right, but want unique */
+ vprint("venus_symlink VBAD: ", tdvp);
+ return error; /* XXX CODADEBUG */
+ }
+
if (!error)
{
struct nameidata nd;
@@ -1652,9 +1659,16 @@
nd.ni_cnd.cn_pnbuf = (char *)nm;
nd.ni_cnd.cn_nameptr = nd.ni_cnd.cn_pnbuf;
nd.ni_pathlen = len;
- vput(tdvp);
+
+ /*
+ * tdvp is refed and VOP_LOCKed. lookup seems to require it to
+ * be referenced (as do all calls ?) but not locked - it locks it.
+ * So, drop the lock but not the reference.
+ */
+ VOP_UNLOCK(tdvp, 0);
error = lookup(&nd);
*ap->a_vpp = nd.ni_vp;
+
}
/*
@@ -1980,6 +1994,10 @@
{
struct cnode *cp;
int err;
+
+ if (type == VBAD) {
+ panic("make_coda_node: VBAD");
+ }
if ((cp = coda_find(fid)) == NULL) {
struct vnode *vp;