Subject: kern/25663: vrele: bad ref count panic when exec_sigcode_map() in sys_execve() fails
To: None <gnats-bugs@gnats.netbsd.org>
From: None <mhitch@netbsd.org>
List: netbsd-bugs
Date: 05/21/2004 20:14:26
>Number: 25663
>Category: kern
>Synopsis: When exec_sigcode_map() in sys_execve() fails, a vrele: bad ref count panic can occur.
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat May 22 02:15:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Michael L. Hitch
>Release: NetBSD 1.6ZC
>Organization:
>Environment:
System: NetBSD netbsd0 1.6ZC NetBSD 1.6ZC (GENERIC.MP) #23: Fri May 21 11:18:40 MDT 2004 mhitch@:/usr/staff/mhitch/200310030000/src/sys/arch/alpha/compile/GENERIC.MP alpha
Architecture: alpha
Machine: alpha
>Description:
Since the beginning of October 2003, attempts to run an OSF dynamic image
on an alpha will panic the system:
$ ../setiathome -version
vrele: bad ref count: tag 1 type VREG, usecount -1, writecount 0, refcount 1,
tag VT_UFS, ino 3096329, on dev 8, 3 flags 0x10, effnlink 1, nlink 1
mode 0100555, owner 271, group 0, size 204800 not locked
panic: vrele: ref cnt vp 0xfffffc00025073a8
I ran into this problem several months ago, but didn't have the ambition
to track it down. While troubleshooting another problem, I needed to run
a -current system and decided to dig into this panic after fixing the
other problem.
The panic is occuring when sys_exit() is freeing up process resources
and does a vrele(p->p_textvp). The reference count is zero at that point,
and if compiled with DIAGNOSTIC, the kernel will panic because the
reference count goes negative.
I started tracking the uses of p_textvp and found that sys_execve()
was setting p_textvp to the vnode of the progam and failing the
exec_sigcode_map() call. The failure cleanup does a vput() on the
vode, which drops the reference count to zero, and when the process
attempting the execve() exits, it tries to vrele() the vode of the
attemoted program.
>How-To-Repeat:
Build an alpha kernel with DIAGNOSTIC from sources after October 3.
Run an OSF dynamically linked program.
Crash!
>Fix:
The following patch is the workaround I came up with initially. It
clears p_textvp on a failure if it's the same value as the vode
sys_execve() is attempting to execute. [There are other paths to
the failure code which may contain the vnode of the currently
running program. If it's the same vnode, this patch will not
do the correct thing. A better fix might be to do the
exec_sigcode_map() call a little earlier, before the vnode
is recorded in p_textvp. It doesn't immediately look like
that would cause any problems, but I don't know the code well
enough to be sure. However, there's another potential path
to exec_abort right after that when checking for suid/guid
program, which would then cause a panic. An alternate
entry label prior to exec_abort called by errors after p_textvp
is set which clears p_textvp would probably do the right thing.]
Index: sys/kern/kern_exec.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_exec.c,v
retrieving revision 1.185
diff -u -r1.185 kern_exec.c
--- sys/kern/kern_exec.c 26 Mar 2004 17:13:37 -0000 1.185
+++ sys/kern/kern_exec.c 22 May 2004 01:28:16 -0000
@@ -874,6 +874,8 @@
vn_lock(pack.ep_vp, LK_EXCLUSIVE | LK_RETRY);
VOP_CLOSE(pack.ep_vp, FREAD, cred, p);
vput(pack.ep_vp);
+ if (p->p_textvp == pack.ep_vp)
+ p->p_textvp = NULL;
uvm_km_free_wakeup(exec_map, (vaddr_t) argp, NCARGS);
free(pack.ep_hdr, M_EXEC);
exit1(l, W_EXITCODE(error, SIGABRT));
>Release-Note:
>Audit-Trail:
>Unformatted: