Subject: Re: Processes getting stuck in disk wait on current currents
To: Mark Davies <mark@MCS.VUW.AC.NZ>
From: Wayne Knowles <wdk@frc.niwa.cri.nz>
List: current-users
Date: 02/19/2001 00:28:53
On Sun, 18 Feb 2001, Wayne Knowles wrote:
> On Sun, 18 Feb 2001, Mark Davies wrote:
>
> > From: Wayne Knowles <wdk@frc.niwa.cri.nz>
> > Date: Sat, 17 Feb 2001 23:30:49 +1300 (NZDT)
> >
> > > My next step is to try and get closer to the date the problem was
> > > introduced - unfortunately that can be a time consuming process!
> >
> > I think I mentioned this in my original post, if not...
> > I don't see the problem in a Feb 4 build and do in Feb 9.
>
> Hi Mark,
>
> I have narrowed it down a little closer. Kernels after Feb 5th have the
> problem. From what I can tell it was introduced with rev 1.90 of
> sys/uvm/uvm_map.c as I can backout that patch on a Feb 5th kernel and the
> problem goes away.
>
> Also with DEBUG and DIAGNOSTIC enabled in the kernel I get the following
> error several minutes before the freeze:
>
> vn_flush: oor vp 0x81c84580 start 0x0 stop 0x100e4000 size 0x1000
>
> Next step is to update the kernel back to -current and back out the
> suspected patch. Will try and get a stack traceback at the time of the
> vn_flush error which may be helpful to the VM folk. Will also record
> UVMHIST information. Hope to have some more info in a few hours.
Mark,
Turns out that it wasn't uvm_map.c at all. I'm currently running
an up to date -current with 2 files reverted back to Feb 4th:
ufs/ffs/ffs_alloc.c (rev 1.40)
miscfs/genfs/genfs_vnops.c (rev 1.25)
The ffs_alloc.c changes look strightforward so I doubt whether they are
the cause. Rev 1.25 of genfs_vnops.c has a number of changes:
RCS file: /cvsroot/syssrc/sys/miscfs/genfs/genfs_vnops.c,v
Working file: genfs_vnops.c
head: 1.28
branch:
locks: strict
access list:
keyword substitution: kv
total revisions: 45; selected revisions: 1
description:
----------------------------
revision 1.26
date: 2001/02/05 12:26:08; author: chs; state: Exp; lines: +47 -26
fix several bugs:
- in the cases where we skip over the i/o loop, increment npages by ridx
so that when the cleanup code starts processing the pgs array at index 0
it'll actually process all of the pages.
- process the PG_RELEASED flag when unbusying pages.
- add some missing MP locking.
- use MIN() and MAX() instead of min() and max() since the latter are
functions which take arguments of type "int" but we call them with
values of type "off_t", so the values could be truncated.
- in the PGO_PASTEOF case, use the larger of the current file size and the
end of the requested range of pages as the file size for this request.
this fixes some problems with sparsing writes to large offsets.
=============================================================================
One of the above changes is causing "pax -zrvpe" to sleep on wchan
"uvn_fp2" and never wakeup when installing a system via sysinst.
Process list of DDB shows the following:
PID PPID PGRP UID S FLAGS COMMAND WAIT
90 88 88 0 3 0x4086 gzip netio
89 88 22 0 3 0x86 sysinst ttyin
88 22 88 0 3 0x4006 pax uvn_fp2
22 16 22 0 3 0x4086 sysinst select
16 1 16 0 3 0x4086 sh wait
5 0 0 0 3 0x20204 aiodoned aiodone
4 0 0 0 3 0x20204 ioflush syncer
3 0 0 0 3 0x20204 reaper reaper
2 0 0 0 3 0x20204 pagedaemon pgdaemo
1 0 1 0 3 0x4084 init wait
0 -1 0 0 3 0x20204 swapper schedul
db>
Do you care to comment Chuck?
Wayne
--
_____ Wayne Knowles, Systems Manager
/ o \/ National Institute of Water & Atmospheric Research Ltd
\/ v /\ P.O. Box 14-901 Kilbirnie, Wellington, NEW ZEALAND
`---' Email: w.knowles@niwa.cri.nz