Subject: Re: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup()
To: Greg Troxel <gdt@ir.bbn.com>
From: Jaromir Dolecek <jdolecek@netbsd.org>
List: tech-kern
Date: 03/16/2003 21:50:45
Locking rules for symlink vnode op have changed some time ago (rev. 1.26
of coda/coda_vnops.c), perhaps the change triggered some
problem in coda?
I'd probably also check that the lookup() call in coda_symlink()
succeeds, and that nd.ni_vp is indeed NULL in that case, since
that appears to be what the code assumes.
Jaromir
Greg Troxel wrote:
> I found that the double-vput problem in vfs_lookup was due to a vnode
> with type V_BAD. This is passed to vfs_lookup from coda_symlink.
> Most of the time, the coda_call to symlink in coda_symlink works, and
> occasionally the call returns without error but the vnode is marked
> VBAD.
>
> I checked for VBAD, and returned -1, but promptly got a panic in
> nfs_symlink, I think because an mbuf that was free()'d was trashed or
> just a bad pointer.
>
> So, I'm guessing that the coda kernel code occasionally messes up, or
> there is some locking problem where the vnode gets modified/marked bad
> by something else. This is all on a 192 MB i386 running
> cfsd/rpcbind/mountd, venus, bash, emacs, sshd/ntpd/etc. and 3 more
> gettys. There is basically nothing else going on, and the machine was
> freshly booted.
>
> I am just beginning to grasp the locking rules, and I'd appreciate
> being set straight if I am confused (and thanks to those who already
> responeded):
>
> the interlock in the vnode protects the vnode ref counts and a few
> other fields in the struct vnode. It is held for short periods only
> and is not about locking the vnode itself.
>
> Having a reference, expressed via the ref count field, protects you
> against the vnode going away or turning into something completely
> different. But it does not guarantee anything about operations on
> the vnode; to serialize those, the vn_lock is used.
>
> struct lock v_lock in the vnode protects the vnode in the larger
> context in terms of fs operations.
>
> When the comments say 'the locked vnode', they always mean the
> struct lock in the vnode (or rather v->v_vnlock, which in the coda
> case always points to v->v_lock since there is no stackable fs stuff
> going on).
>
> Little mention is made of the interlock in terms of locking
> discussions, other than in vnode(9), because that's too obvious.
>
> vput, for example, expects that the interlock is not held. It
> unlocks *v->vn_lock, and then decrements usecount. To do the
> latter, it has to acquire the interlock, but that's not mentioned.
>
> One should in general not hold the interlock when calling VOP_LOCK
> and VOP_UNLOCK or other vnops. But some operations take the
> LK_INTERLOCK flag to indicate that the interlock is already held.
>
> So, is it reasonable for an unlocked vnode to change to VBAD?
>
> Does holding the vn_lock mean that vgone should not be called?
>
> Is there any place else I should suspect that is changing the type to
> VBAD?
>
> Greg Troxel <gdt@ir.bbn.com>
>
--
Jaromir Dolecek <jdolecek@NetBSD.org> http://www.NetBSD.org/
-=- We should be mindful of the potential goal, but as the tantric -=-
-=- Buddhist masters say, ``You may notice during meditation that you -=-
-=- sometimes levitate or glow. Do not let this distract you.'' -=-