tech-kern: Re: Getting information out of crash dump

Subject: Re: Getting information out of crash dump
To: Chuck Silvers <chuq@chuq.com>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 04/26/1999 14:25:54
On Mon, 26 Apr 1999, Chuck Silvers wrote:

> if this is the same "vrele: ref cnt" panic that was PR'd yesterday,
> I'm looking at it too.

I've been trying to track a similar problem down for the last week or so,
without success. Something somewhere is calling vrele on a vnode it didn't
vget. I've seen it in the context of vnd disks dying (VOP_BMAP gets back a
vnode with usecount == 0).

> in this dump, the rest of the stack trace is:
> frame ptr	pc		function
> 0xf3ef1e10	0xf0100f1d	calltrap
> 0xf3ef1e24	0xf018ba69	panic
> 0xf3ef1e50	0xf01a36c0	vrele+80
> 0xf3ef1f24	0xf01a8497	rename_files+959
> 0xf3ef1f3c	0xf01a80bd	sys_rename+21
> 0xf3ef1fa8	0xf027ed26	syscall+526
> 0xefbfdd64	0xf0100fc9	syscall1+31
> 
> this corresponds to the vrele() in this bit at the end of
> kern/vfs_syscalls.c:rename_files()
> 
> out1:
> 	if (fromnd.ni_startdir)
> 		vrele(fromnd.ni_startdir);
> 	FREE(fromnd.ni_cnd.cn_pnbuf, M_NAMEI);
> 	return (error == -1 ? 0 : error);

While I think this code is weird, I think it's ok. It's been that way for
5 years and is that way in FreeBSD, so I doubt that's the problem. I think
the problem is that somewhere else has vrele'd when it shouldn't, and this
is where it got caught.

So what will be hanging onto vnode references?

I asked Charles about it, and he pointed out that the name cache and the
buffer cache. But the buffer cache uses hold counts (not usecount which
hit zero here), and the name cache doesn't mess with usecount on the nodes
it caches.

I'm asking as I've seen something really bumping the count, and I suspect
whatever it is is getting things wrong.


Here's what I saw:

I changed vndstrategy() so that it would vprint the vnode it gets back
from VOP_BMAP. I had a filesystem on /dev/sd0d mounted at /TEST. The vnd
is configuring /dev/vnd0d on /TEST/bill/foo. The way VOP_BMAP is designed
to work is that it will map the file blocks (blocks in the vn device) into
blocks on the underlying device (/dev/sd0d).

I expected that, as I had one fs mounted, the usecount would be around 1
or 2. In some runs, I got 4, then 5, then 6. In other runs, I got numbers
around 50, and in others, around 200. This is 200 users of /dev/sd0d!

Does anyone know what other caching would be doing this?

Take care,

Bill