Subject: locking bug in coda_lookup?
To: None <tech-kern@netbsd.org>
From: Greg Troxel <gdt@ir.bbn.com>
List: tech-kern
Date: 08/31/2004 09:48:33
I got a panic in coda_lookup:
unlocked parent but couldn't lock child
while doing
find . -type f -print0 | xargs -0 cat > /dev/null
in coda. I'm not certain which of two very similarfragments the code
was in (don't have netbsd.gdb any more), but it was very much like
this:
if (*ap->a_vpp) {
if ((error = vn_lock(*ap->a_vpp, LK_EXCLUSIVE))) {
printf("coda_lookup: ");
panic("unlocked parent but couldn't lock child");
}
}
It seems that this will fail if someone else has the vnode locked, and
there will be no retry. From reading vn_lock, it seems that if one
passes LK_NOWAIT, that on encountering a locked vnode, vn_lock returns
immediately. If LK_NOWAIT is not set, and LK_RETRY is also not set,
it seems that vn_lock will tsleep on the vnode's v_interlock, and then
return ENOENT instead of retrying. The ufs code uses LK_RETRY in what
I think is the analogous case.
I don't understand why it makes sense to sleep and not retry the lock
- if one isn't going to retry, what's the point of sleeping?
So, I think that coda_lookup should pass LK_RETRY. But I either don't
quite or just barely understand vnode locking, so I'd appreciate
advice here.
Also, it seems that the coda lookup operation doesn't properly handle
IS_DOTDOT where the locking rules are different. That could be
related to my crash instead.