Subject: Re: Getting information out of crash dump
To: Chuck Silvers <chuq@chuq.com>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 04/26/1999 14:25:54
On Mon, 26 Apr 1999, Chuck Silvers wrote:
> if this is the same "vrele: ref cnt" panic that was PR'd yesterday,
> I'm looking at it too.
I've been trying to track a similar problem down for the last week or so,
without success. Something somewhere is calling vrele on a vnode it didn't
vget. I've seen it in the context of vnd disks dying (VOP_BMAP gets back a
vnode with usecount == 0).
> in this dump, the rest of the stack trace is:
> frame ptr pc function
> 0xf3ef1e10 0xf0100f1d calltrap
> 0xf3ef1e24 0xf018ba69 panic
> 0xf3ef1e50 0xf01a36c0 vrele+80
> 0xf3ef1f24 0xf01a8497 rename_files+959
> 0xf3ef1f3c 0xf01a80bd sys_rename+21
> 0xf3ef1fa8 0xf027ed26 syscall+526
> 0xefbfdd64 0xf0100fc9 syscall1+31
>
> this corresponds to the vrele() in this bit at the end of
> kern/vfs_syscalls.c:rename_files()
>
> out1:
> if (fromnd.ni_startdir)
> vrele(fromnd.ni_startdir);
> FREE(fromnd.ni_cnd.cn_pnbuf, M_NAMEI);
> return (error == -1 ? 0 : error);
While I think this code is weird, I think it's ok. It's been that way for
5 years and is that way in FreeBSD, so I doubt that's the problem. I think
the problem is that somewhere else has vrele'd when it shouldn't, and this
is where it got caught.
So what will be hanging onto vnode references?
I asked Charles about it, and he pointed out that the name cache and the
buffer cache. But the buffer cache uses hold counts (not usecount which
hit zero here), and the name cache doesn't mess with usecount on the nodes
it caches.
I'm asking as I've seen something really bumping the count, and I suspect
whatever it is is getting things wrong.
Here's what I saw:
I changed vndstrategy() so that it would vprint the vnode it gets back
from VOP_BMAP. I had a filesystem on /dev/sd0d mounted at /TEST. The vnd
is configuring /dev/vnd0d on /TEST/bill/foo. The way VOP_BMAP is designed
to work is that it will map the file blocks (blocks in the vn device) into
blocks on the underlying device (/dev/sd0d).
I expected that, as I had one fs mounted, the usecount would be around 1
or 2. In some runs, I got 4, then 5, then 6. In other runs, I got numbers
around 50, and in others, around 200. This is 200 users of /dev/sd0d!
Does anyone know what other caching would be doing this?
Take care,
Bill