Subject: Re: nfs vs pagedaemon
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 04/08/2003 10:00:40
--FL5UXtIhxfXey3p5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
hi,
yea, this makes sense. the more general description of the problem
is that nfs_strategy() should only use the commit mechanism when it's
being called from VOP_PUTPAGES() on an NFS vnode, but there are other
ways that nfs_strategy() can be called. NFS swap files is one way we
can go wrong, but there are others, such as using an NFS file for
a vnd backing file. one way to fix this more completely would be to
have the NFS VOP_PUTPAGES() path add an extra flag to its buffers
that nfs_strategy() can look at to decide if it should use stable writes.
I had written this up a while back but didn't commit it for some reason.
the diff is attached if you want to pursue it.
to allow unstable writes for NFS swap files, it may be possible to
transfer ownership of the pages being paged out from the original
anon or aobj to the NFS vnode before sending the data to the server.
I haven't thought it through though, so this may not work out very well.
-Chuck
On Mon, Apr 07, 2003 at 10:50:16PM +0900, YAMAMOTO Takashi wrote:
> i found the problem.
> because the file offset that corresponds to the page changes
> in the case of swap and we're using byte-range of the file
> to track commited/uncommited, we can't do unstable writes+commit for swap.
>
> the attached patch simply attempts stable writes in the case of pageout.
> i'll commit this if no one objects.
> thanks.
>
> YAMAMOTO Takashi
--FL5UXtIhxfXey3p5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="diff.nfs-unstable"
Index: kern/vfs_bio.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_bio.c,v
retrieving revision 1.85
diff -u -r1.85 vfs_bio.c
--- kern/vfs_bio.c 2002/09/06 13:23:52 1.85
+++ kern/vfs_bio.c 2003/01/01 23:18:51
@@ -424,7 +424,7 @@
}
/* Otherwise, the "write" is done, so mark and release the buffer. */
- CLR(bp->b_flags, B_NEEDCOMMIT|B_DONE);
+ CLR(bp->b_flags, B_DONE);
splx(s);
brelse(bp);
Index: miscfs/genfs/genfs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/miscfs/genfs/genfs_vnops.c,v
retrieving revision 1.68
diff -u -r1.68 genfs_vnops.c
--- miscfs/genfs/genfs_vnops.c 2002/11/15 14:01:57 1.68
+++ miscfs/genfs/genfs_vnops.c 2003/01/01 23:18:53
@@ -1392,7 +1392,8 @@
mbp->b_bufsize = npages << PAGE_SHIFT;
mbp->b_data = (void *)kva;
mbp->b_resid = mbp->b_bcount = bytes;
- mbp->b_flags = B_BUSY|B_WRITE|B_AGE| (async ? (B_CALL|B_ASYNC) : 0);
+ mbp->b_flags = B_BUSY | B_WRITE | B_AGE | B_PAGEIO |
+ (async ? (B_CALL|B_ASYNC) : 0);
mbp->b_iodone = uvm_aio_biodone;
mbp->b_vp = vp;
LIST_INIT(&mbp->b_dep);
@@ -1430,7 +1431,8 @@
bp->b_data = (char *)kva +
(vaddr_t)(offset - pg->offset);
bp->b_resid = bp->b_bcount = iobytes;
- bp->b_flags = B_BUSY|B_WRITE|B_CALL|B_ASYNC;
+ bp->b_flags = B_BUSY | B_WRITE | B_CALL | B_ASYNC |
+ B_PAGEIO;
bp->b_iodone = uvm_aio_biodone1;
bp->b_vp = vp;
LIST_INIT(&bp->b_dep);
@@ -1633,7 +1635,7 @@
bp = pool_get(&bufpool, PR_WAITOK);
splx(s);
- bp->b_flags = B_BUSY | B_WRITE | B_AGE;
+ bp->b_flags = B_BUSY | B_WRITE | B_AGE | B_PAGEIO;
bp->b_vp = vp;
bp->b_lblkno = offset >> vp->v_mount->mnt_fs_bshift;
bp->b_data = (char *)kva;
Index: nfs/nfs_bio.c
===================================================================
RCS file: /cvsroot/src/sys/nfs/nfs_bio.c,v
retrieving revision 1.85
diff -u -r1.85 nfs_bio.c
--- nfs/nfs_bio.c 2002/10/29 10:15:16 1.85
+++ nfs/nfs_bio.c 2003/01/01 23:18:57
@@ -985,21 +985,25 @@
bp->b_error = error;
}
} else {
- int i, npages = bp->b_bufsize >> PAGE_SHIFT;
+ int i, npages = (bp->b_bcount + PAGE_MASK) >> PAGE_SHIFT;
struct vm_page *pgs[npages];
boolean_t needcommit = TRUE;
- if ((bp->b_flags & B_ASYNC) != 0 && NFS_ISV3(vp)) {
+ if ((~bp->b_flags & (B_ASYNC | B_PAGEIO)) == 0 &&
+ NFS_ISV3(vp)) {
iomode = NFSV3WRITE_UNSTABLE;
} else {
iomode = NFSV3WRITE_FILESYNC;
+ needcommit = FALSE;
}
- for (i = 0; i < npages; i++) {
- pgs[i] = uvm_pageratop((vaddr_t)bp->b_data +
- (i << PAGE_SHIFT));
- if ((pgs[i]->flags & PG_NEEDCOMMIT) == 0) {
- needcommit = FALSE;
+ if (needcommit) {
+ for (i = 0; i < npages; i++) {
+ pgs[i] = uvm_pageratop((vaddr_t)bp->b_data +
+ (i << PAGE_SHIFT));
+ if ((pgs[i]->flags & PG_NEEDCOMMIT) == 0) {
+ needcommit = FALSE;
+ }
}
}
if (!needcommit && iomode == NFSV3WRITE_UNSTABLE) {
Index: sys/buf.h
===================================================================
RCS file: /cvsroot/src/sys/sys/buf.h,v
retrieving revision 1.55
diff -u -r1.55 buf.h
--- sys/buf.h 2002/10/06 17:05:56 1.55
+++ sys/buf.h 2003/01/01 23:18:57
@@ -193,7 +193,7 @@
* These flags are kept in b_flags.
*/
#define B_AGE 0x00000001 /* Move to age queue when I/O done. */
-#define B_NEEDCOMMIT 0x00000002 /* Needs committing to stable storage */
+#define B_PAGEIO 0x00000002 /* Data is file system page cache */
#define B_ASYNC 0x00000004 /* Start I/O, do not wait. */
#define B_BAD 0x00000008 /* Bad block revectoring in progress. */
#define B_BUSY 0x00000010 /* I/O in progress. */
--FL5UXtIhxfXey3p5--