Subject: Re: Processes getting stuck in disk wait on current currents
To: Wayne Knowles <wdk@frc.niwa.cri.nz>
From: Chuck Silvers <chuq@chuq.com>
List: current-users
Date: 02/18/2001 07:13:34
hi,

I just checked in a fix for the bit with processes getting stuck with
"uvn_fp2", please update genfs_vnops.c and let me know if it still
happens.  this bug was actually there all along, so I'm not sure
why people only started seeing it recently.  I'm guessing that that
previous genfs_vnops.c fix changed whatever code was masking this
problem.

anyway, if there are still problems with rev. 1.29, let me know and
I'll look for yet more bugs.

-Chuck


On Mon, Feb 19, 2001 at 12:28:53AM +1300, Wayne Knowles wrote:
> On Sun, 18 Feb 2001, Wayne Knowles wrote:
> 
> > On Sun, 18 Feb 2001, Mark Davies wrote:
> > 
> > > 	From:  Wayne Knowles <wdk@frc.niwa.cri.nz>
> > > 	Date:  Sat, 17 Feb 2001 23:30:49 +1300 (NZDT)
> > > 
> > > > My next step is to try and get closer to the date the problem was
> > > > introduced - unfortunately that can be a time consuming process!
> > > 
> > > I think I mentioned this in my original post, if not...
> > > I don't see the problem in a Feb 4 build and do in Feb 9.
> > 
> > Hi Mark,
> > 
> > I have narrowed it down a little closer.  Kernels after Feb 5th have the
> > problem.  From what I can tell it was introduced with rev 1.90 of
> > sys/uvm/uvm_map.c as I can backout that patch on a Feb 5th kernel and the
> > problem goes away.
> > 
> > Also with DEBUG and DIAGNOSTIC enabled in the kernel I get the following
> > error several minutes before the freeze:
> > 
> >   vn_flush: oor vp 0x81c84580 start 0x0 stop 0x100e4000 size 0x1000
> > 
> > Next step is to update the kernel back to -current and back out the
> > suspected patch.  Will try and get a stack traceback at the time of the
> > vn_flush error which may be helpful to the VM folk.  Will also record
> > UVMHIST information.   Hope to have some more info in a few hours.
> 
> Mark,
> 
> Turns out that it wasn't uvm_map.c at all.  I'm currently running
> an up to date -current with 2 files reverted back to Feb 4th:
> 
>  ufs/ffs/ffs_alloc.c		(rev 1.40)
>  miscfs/genfs/genfs_vnops.c	(rev 1.25)
> 
> The ffs_alloc.c changes look strightforward so I doubt whether they are
> the cause.  Rev 1.25 of genfs_vnops.c has a number of changes:
> 
> 
> RCS file: /cvsroot/syssrc/sys/miscfs/genfs/genfs_vnops.c,v
> Working file: genfs_vnops.c
> head: 1.28
> branch:
> locks: strict
> access list:
> keyword substitution: kv
> total revisions: 45;    selected revisions: 1
> description:
> ----------------------------
> revision 1.26
> date: 2001/02/05 12:26:08;  author: chs;  state: Exp;  lines: +47 -26
> fix several bugs:
>  - in the cases where we skip over the i/o loop, increment npages by ridx
>    so that when the cleanup code starts processing the pgs array at index 0
>    it'll actually process all of the pages.
>  - process the PG_RELEASED flag when unbusying pages.
>  - add some missing MP locking.
>  - use MIN() and MAX() instead of min() and max() since the latter are
>    functions which take arguments of type "int" but we call them with
>    values of type "off_t", so the values could be truncated.
>  - in the PGO_PASTEOF case, use the larger of the current file size and the
>    end of the requested range of pages as the file size for this request.
>    this fixes some problems with sparsing writes to large offsets.
> =============================================================================
> 
> One of the above changes is causing "pax -zrvpe" to sleep on wchan
> "uvn_fp2" and never wakeup when installing a system via sysinst.
> 
> Process list of DDB shows the following:
> 
>  PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
>  90                88         88          0 3  0x4086             gzip   netio
>  89                88         22          0 3    0x86          sysinst   ttyin
>  88                22         88          0 3  0x4006              pax uvn_fp2
>  22                16         22          0 3  0x4086          sysinst  select
>  16                 1         16          0 3  0x4086               sh    wait
>  5                  0          0          0 3 0x20204         aiodoned aiodone
>  4                  0          0          0 3 0x20204          ioflush  syncer
>  3                  0          0          0 3 0x20204           reaper  reaper
>  2                  0          0          0 3 0x20204       pagedaemon pgdaemo
>  1                  0          1          0 3  0x4084             init    wait
>  0                 -1          0          0 3 0x20204          swapper schedul
> db>
> 
> Do you care to comment Chuck?
> 
> Wayne
> -- 
>   _____	   	Wayne Knowles,  Systems Manager
>  / o   \/   	National Institute of Water & Atmospheric Research Ltd
>  \/  v /\   	P.O. Box 14-901 Kilbirnie, Wellington, NEW ZEALAND
>   `---'     	Email:   w.knowles@niwa.cri.nz
> 
>