Subject: Re: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup()
To: None <tech-kern@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-kern
Date: 03/16/2003 15:23:05
I found that the double-vput problem in vfs_lookup was due to a vnode
with type V_BAD. This is passed to vfs_lookup from coda_symlink.
Most of the time, the coda_call to symlink in coda_symlink works, and
occasionally the call returns without error but the vnode is marked
VBAD.
I checked for VBAD, and returned -1, but promptly got a panic in
nfs_symlink, I think because an mbuf that was free()'d was trashed or
just a bad pointer.
So, I'm guessing that the coda kernel code occasionally messes up, or
there is some locking problem where the vnode gets modified/marked bad
by something else. This is all on a 192 MB i386 running
cfsd/rpcbind/mountd, venus, bash, emacs, sshd/ntpd/etc. and 3 more
gettys. There is basically nothing else going on, and the machine was
freshly booted.
I am just beginning to grasp the locking rules, and I'd appreciate
being set straight if I am confused (and thanks to those who already
responeded):
the interlock in the vnode protects the vnode ref counts and a few
other fields in the struct vnode. It is held for short periods only
and is not about locking the vnode itself.
Having a reference, expressed via the ref count field, protects you
against the vnode going away or turning into something completely
different. But it does not guarantee anything about operations on
the vnode; to serialize those, the vn_lock is used.
struct lock v_lock in the vnode protects the vnode in the larger
context in terms of fs operations.
When the comments say 'the locked vnode', they always mean the
struct lock in the vnode (or rather v->v_vnlock, which in the coda
case always points to v->v_lock since there is no stackable fs stuff
going on).
Little mention is made of the interlock in terms of locking
discussions, other than in vnode(9), because that's too obvious.
vput, for example, expects that the interlock is not held. It
unlocks *v->vn_lock, and then decrements usecount. To do the
latter, it has to acquire the interlock, but that's not mentioned.
One should in general not hold the interlock when calling VOP_LOCK
and VOP_UNLOCK or other vnops. But some operations take the
LK_INTERLOCK flag to indicate that the interlock is already held.
So, is it reasonable for an unlocked vnode to change to VBAD?
Does holding the vn_lock mean that vgone should not be called?
Is there any place else I should suspect that is changing the type to
VBAD?
Greg Troxel <gdt@ir.bbn.com>