Subject: kern/14090: "panic: lockmgr: locking against myself" with nullfs
To: None <gnats-bugs@gnats.netbsd.org>
From: None <apb@cequrux.com>
List: netbsd-bugs
Date: 09/28/2001 17:16:55
>Number: 14090
>Category: kern
>Synopsis: "panic: lockmgr: locking against myself" with nullfs
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Sep 28 08:21:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator: Alan Barrett
>Release: NetBSD-current 'Sun Sep 23 21:12:12 EDT 2001'
>Organization:
Not much
>Environment:
NetBSD/i386 1.5Y
Built from sources checked out from CVS with -D'Sun Sep 23 21:12:12 EDT 2001'
>Description:
This is a new problem in NetBSD-1.5Y. It was not present in a kernel
built on 5 Sep 2001 from sources that were probably a few days older
than that.
In an environment that makes heavy use of nullfs and raid0 filesystems,
one of the "find" commands run from the daily cron jobs causes a panic.
I don't yet know for sure which find command was responsible, but I
suspect the find|xargs|sort pipeline in the check_devices section of
/etc/security.
Here's the panic message and a hand-transcribed backtrace (with function
arguments omitted):
panic: lockmgr: locking against myself
stopped in pid 28479 (find) at cpu_Debugger+0x4: leave
db> t
cpu_Debugger(...) +0x4
panic(...) +0xad
lockmgr(...) +0x591
layer_lock(...) +0x44
VOP_LOCK(...) +0x2e
vn_lock(...) +0x5d
getnewvnode(...) +0x122
ffs_vget(...) +0x4f
ufs_lookup(...) +0x9bd
VOP_LOOKUP(...) +0x35
lookup(...) +0x236
namei(...) +0x2f1
sys___lstat13(...) +0x4f
syscall_plain(...) +0xa7
db>
The only printable string that I was easily able to discover was the
second arg passed to lookup(), and that string was "ircsearch". I
believe that two paths match that name: /r1a/USR-PKG/bin/ircsearch is a
directory on an ordinary FFS filesystem on a raid0 partition, and
/usr/pkg/bin/ircsearch is a nullfs image of the same underlying file.
The /etc/fstab entries for the /r1a and /usr/pkg filesystems are as
follows:
/dev/raid1a /r1a ffs rw,softdep 1 3
/r1a/USR-PKG /usr/pkg null rw 0 0
>How-To-Repeat:
>Fix:
1) Fix the locking bug.
2) It might be a good idea to add "-o -fstype null" to the set of
exclusions in the find command. There's no point in walking
both the nullfs tree and the underlying tree.
>Release-Note:
>Audit-Trail:
>Unformatted: