Subject: Re: supervisor trap page fault in lfs_putpages
To: None <current-users@NetBSD.org>
From: Paul Ripke <stix@stix.id.au>
List: current-users
Date: 12/01/2006 22:28:07
The saga continues...
As suggested, I tried a dump and restore, with a
"newfs_lfs -A /dev/rld0g" in between.
I managed to get a hang during the restore... system would ping,
and I could switch text VTs, but thats all. So I jumped into ddb
got a backtrace, continued, back to ddb, backtrace, etc, about 20
times. Over about a 10 minute period - during which the system
did not appear to make any forward progress ("systat vm 1" in
another VT didn't budge).
Every backtrace had lfs_writer -> lfs_flush_pchain(), offsets
between +0x110 and +0x127, which corresponds to the following
from lfs_vnops.c according to gdb:
/*
* lfs_writevnodes, optimized to clear pageout requests.
* Only write non-dirop files that are in the pageout queue.
* We're very conservative about what we write; we want to be
* fast and async.
*/
simple_lock(&fs->lfs_interlock);
top:
0x110 for (ip = TAILQ_FIRST(&fs->lfs_pchainhd); ip != NULL; ip = nip) {
nip = TAILQ_NEXT(ip, i_lfs_pchain);
vp = ITOV(ip);
if (!(ip->i_flags & IN_PAGING))
0x127 goto top;
if (vp->v_flag & (VXLOCK|VDIROP))
continue;
I don't pretend to understand the guts of this, but is it possible
it was stuck hitting the goto every time? As previously, have core,
have netbsd.gdb. Oh, and I tried this twice, so it wasn't a fluke.
And nada from google, and no PR that I could find.
--
stix