Subject: Re: Followup to #5073
To: Dominic J Hulewicz <dom@inta.net>
From: Greg Wohletz <greg@duke.CS.UNLV.EDU>
List: current-users
Date: 07/29/1998 02:17:46
>I reported a problem (#5073) back in February about a NetBSD 1.3 i386
>host that regularly panics with "vrele: ref cnt". After swapping out
>all the hardware and completely reinstalling with 1.3.2, the problem
>still occurred so I knew that it must be something I was running.
>
>I eventually traced it to an accidental duplication of rsync processes.
>I have two scripts on another machine sending data via rsync+ssh to the
>NetBSD host, one sends at two minute intervals, the other every five
>minutes. At some point the two separate copy commands had been merged
>into a single script, but the cron entry for the second script had not
>been removed. This meant that every ten minutes, two rsync processes
>would collide and try to mirror the same directory to the same machine.
>
>I guess the (bad) luck of the machine panicing is down to some sort of
>race condition / timing issue that occurs. I would imagine the problem
>could be easily replicated by setting up two or more timed rsync
>sessions set to go off at the same time.
>
I'm not doing anything of this sort, my system is just an NFS server for 50
or so unix workstation. It gets these panics quite frequently. Doing a
dump of the filesystem seems to greatly increase the odds that a panic will
occur. I also submitted a PR (#5026)
In looking at the -current source it looks like the vget/vrele code is
undergoing significant change, maybe the new implementation will manage to
avoid whatever this mysterious problem is.
If anyone is curious, what I have discovered about this bug is included in
the PR, I also have various core dumps / kernels available if folks want to
look at them at http://www.egr.unlv.edu/~greg/netbsd/
Currently I have a really ugly work around in place for this panic which
seems to be working out OK, but it won't really get tested till fall
semester starts...
--Greg