Subject: Re: nfsd: locking botch in op %d
To: Frank van der Linden <fvdl@wasabisystems.com>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 03/08/2001 12:06:33
>> The NFS server on my house LAN's NFS subnet fell over with "nfsd:
>> locking botch in op 3".
> Yes, this has been seen before. The case that was reported before
> was a netbsd-1-5 branch kernel as a server, and a Linux client,
> running 'du -a'. It also crashed when doing a lookup for a device
> node ("sd0a" in your case), curiously enough, so there may be a
> problem there.
I pulled in just the change to make it call printlockedvnodes() when
this happened (and just print, not panic, despite the comment). I
built that kernel overnight, and today I find....
Locked vnodes
tag 1 type VDIR, usecount 1, writecount 0, refcount 1,
tag VT_UFS, ino 11726, on dev 7, 0 flags 0x0, effnlink 2, nlink 2
mode 040755, owner 101, group 0, size 512 lock type vnlock: EXCL (count 1) by pid 246
nfsd: locking botch in op 3 (before 0, after 1)
This is very interesting on three counts:
(1) dev 7,0 is not NFS-exported; that's the root filesystem (which is
on the server's sd0a; I note the lookup which fell over was for
sd0a in the client's filesystem, which has the same major/minor
numbers, and that makes me wonder if this may have something to do
with checkalias()).
(2) inode 11726 on 7,0 has nothing whatever to do with NFS; it's
/home/mouse/.prompt/, a directory that's nowhere near anything an
NFS client could be touching (only /nfs is exported).
(3) pid 246 is the shell in one of my windows; again, nothing whatever
to do with anything an NFS client could be going near.
How p->p_locks could be 1 for the server process when the only locked
vnode is locked by a completely unrelated process is a mystery to me.
Perhaps it's holding something other than a vnode locked?
> Unfortunately, tracking this down basically means either reading
> through a lot of code, or changing every vnode lock call into a debug
> statement, saving the current line and file, as well as maintaining a
> linked list of locked vnodes for each process.
If the implication above that it's not a vnode that the server process
has locked is correct, changing vnode lock calls wouldn't help much.
> If you could look into this one as well, that'd be great.
I'll be looking into it further and will report anything I find. As it
stands, it means I can't boot one of my diskless clients, which gives
me an incentive. :-)
> Are you using softdeps on the server, btw?
No. I've never even tried to use softdeps, anywhere.
der Mouse
mouse@rodents.montreal.qc.ca
7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B