Source-Changes-HG archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
[src/trunk]: src/sys/ufs Various minor LFS improvements:
details: https://anonhg.NetBSD.org/src/rev/ac03fa3eca05
branches: trunk
changeset: 574261:ac03fa3eca05
user: perseant <perseant%NetBSD.org@localhost>
date: Sat Feb 26 05:40:42 2005 +0000
description:
Various minor LFS improvements:
* Note when lfs_putpages(9) thinks it is not going to be writing any
pages before calling genfs_putpages(9). This prevents a situation in
which blocks can be queued for writing without a segment header.
* Correct computation of NRESERVE(), though it is still a gross
overestimate in most cases. Note that if NRESERVE() is too high, it
may be impossible to create files on the filesystem. We catch this
case on filesystem mount and refuse to mount r/w.
* Allow filesystems to be mounted whose block size is == MAXBSIZE.
* Somewhere along the line, ufs_bmaparray(9) started mangling UNWRITTEN
entries in indirect blocks again, triggering a failed assertion "daddr
<= LFS_MAX_DADDR". Explicitly convert to and from int32_t to correct
this.
* Add a high-water mark for the number of dirty pages any given LFS can
hold before triggering a flush. This is settable by sysctl, but off
(zero) by default.
* Be more careful about the MAX_BYTES and MAX_BUFS computations so we
shouldn't see "please increase to at least zero" messages.
* Note that VBLK and VCHR vnodes can have nonzero values in di_db[0]
even though their v_size == 0. Don't panic when we see this.
* Change lfs_bfree to a signed quantity. The manner in which it is
processed before being passed to the cleaner means that sometimes it
may drop below zero, and the cleaner must be aware of this.
* Never report bfree < 0 (or higher than lfs_dsize) through
lfs_statvfs(9). This prevents df(1) from ever telling us that our full
filesystems have 16TB free.
* Account space allocated through lfs_balloc(9) that does not have
associated buffer headers, so that the pagedaemon doesn't run us out
of segments.
* Return ENOSPC from lfs_balloc(9) when bfree drops to zero.
* Address a deadlock in lfs_bmapv/lfs_markv when the filesystem is being
unmounted. Because vfs_busy() is a shared lock, and
lfs_bmapv/lfs_markv mark the filesystem vfs_busy(), the cleaner can be
holding the lock that umount() is blocking on, then try to vfs_busy()
again in getnewvnode().
diffstat:
sys/ufs/lfs/TODO | 9 +--
sys/ufs/lfs/lfs.h | 63 +++++++++++++----
sys/ufs/lfs/lfs_alloc.c | 9 +-
sys/ufs/lfs/lfs_balloc.c | 109 +++++++++++++++++++++++++++++++-
sys/ufs/lfs/lfs_bio.c | 102 ++++++++++++++++++++---------
sys/ufs/lfs/lfs_extern.h | 14 +++-
sys/ufs/lfs/lfs_segment.c | 17 ++--
sys/ufs/lfs/lfs_subr.c | 21 ++++-
sys/ufs/lfs/lfs_syscalls.c | 24 ++++++-
sys/ufs/lfs/lfs_vfsops.c | 150 +++++++++++++++++++++++++++++++------------
sys/ufs/lfs/lfs_vnops.c | 44 +++++++++--
sys/ufs/ufs/ufs_readwrite.c | 5 +-
12 files changed, 433 insertions(+), 134 deletions(-)
diffs (truncated from 1344 to 300 lines):
diff -r 8f86e89ad850 -r ac03fa3eca05 sys/ufs/lfs/TODO
--- a/sys/ufs/lfs/TODO Sat Feb 26 02:57:32 2005 +0000
+++ b/sys/ufs/lfs/TODO Sat Feb 26 05:40:42 2005 +0000
@@ -1,11 +1,7 @@
-# $NetBSD: TODO,v 1.7 2003/02/23 00:22:33 perseant Exp $
+# $NetBSD: TODO,v 1.8 2005/02/26 05:40:42 perseant Exp $
- Lock audit. Need to check locking for multiprocessor case in particular.
-- Get rid of the syscalls: make them into ioctl calls instead. This would
- allow LFS to be loaded as a module. We would then ideally have an
- in-kernel cleaner that runs if no userland cleaner has asserted itself.
-
- Get rid of lfs_segclean(); the kernel should clean a dirty segment IFF it
has passed two checkpoints containing zero live bytes.
@@ -23,9 +19,6 @@
locking problem in lfs_{bmapv,markv} goes away and lfs_reserve can go,
too.
-- Fully working fsck_lfs. (Really, need a general-purpose external
- partial-segment writer.)
-
- Get rid of DEV_BSIZE, pay attention to the media block size at mount time.
- More fs ops need to call lfs_imtime. Which ones? (Blackwell et al., 1995)
diff -r 8f86e89ad850 -r ac03fa3eca05 sys/ufs/lfs/lfs.h
--- a/sys/ufs/lfs/lfs.h Sat Feb 26 02:57:32 2005 +0000
+++ b/sys/ufs/lfs/lfs.h Sat Feb 26 05:40:42 2005 +0000
@@ -1,4 +1,4 @@
-/* $NetBSD: lfs.h,v 1.74 2004/08/14 14:32:04 mycroft Exp $ */
+/* $NetBSD: lfs.h,v 1.75 2005/02/26 05:40:42 perseant Exp $ */
/*-
* Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -77,6 +77,7 @@
#define LFS_DEBUG_RFW /* print roll-forward debugging info */
#define LFS_LOGLENGTH 1024 /* size of debugging log */
#define LFS_MAX_ACTIVE 10 /* Dirty segments before ckp forced */
+#define LFS_PD /* pagedaemon codaemon */
/* #define DEBUG_LFS */ /* Intensive debugging of LFS subsystem */
@@ -111,9 +112,11 @@
/* Resource limits */
#define LFS_MAX_BUFS ((nbuf >> 2) - 10)
#define LFS_WAIT_BUFS ((nbuf >> 1) - (nbuf >> 3) - 10)
-extern u_long bufmem; /* XXX */
-#define LFS_MAX_BYTES ((bufmem >> 2) - 10 * PAGE_SIZE)
-#define LFS_WAIT_BYTES ((bufmem >> 1) - (bufmem >> 3) - 10 * PAGE_SIZE)
+#define LFS_INVERSE_MAX_BUFS(n) (((n) + 10) << 2)
+extern u_long bufmem_lowater, bufmem_hiwater; /* XXX */
+#define LFS_MAX_BYTES ((bufmem_lowater >> 2) - 10 * PAGE_SIZE)
+#define LFS_INVERSE_MAX_BYTES(n) (((n) + 10 * PAGE_SIZE) << 2)
+#define LFS_WAIT_BYTES ((bufmem_lowater >> 1) - (bufmem_lowater >> 3) - 10 * PAGE_SIZE)
#define LFS_MAX_DIROP ((desiredvnodes >> 2) + (desiredvnodes >> 3))
#define LFS_MAX_PAGES \
(((uvmexp.active + uvmexp.inactive + uvmexp.free) * uvmexp.filemin) >> 8)
@@ -121,7 +124,6 @@
(((uvmexp.active + uvmexp.inactive + uvmexp.free) * uvmexp.filemax) >> 8)
#define LFS_BUFWAIT 2 /* How long to wait if over *_WAIT_* */
-
/*
* Reserved blocks for lfs_malloc
*/
@@ -466,7 +468,7 @@
typedef struct _cleanerinfo {
u_int32_t clean; /* number of clean segments */
u_int32_t dirty; /* number of dirty segments */
- u_int32_t bfree; /* disk blocks free */
+ int32_t bfree; /* disk blocks free */
int32_t avail; /* disk blocks available */
u_int32_t free_head; /* head of the inode free list */
u_int32_t free_tail; /* tail of the inode free list */
@@ -487,9 +489,11 @@
/* Synchronize the Ifile cleaner info with current avail and bfree */
#define LFS_SYNC_CLEANERINFO(cip, fs, bp, w) do { \
if ((w) || (cip)->bfree != (fs)->lfs_bfree || \
- (cip)->avail != (fs)->lfs_avail - (fs)->lfs_ravail) { \
+ (cip)->avail != (fs)->lfs_avail - (fs)->lfs_ravail - \
+ (fs)->lfs_favail) { \
(cip)->bfree = (fs)->lfs_bfree; \
- (cip)->avail = (fs)->lfs_avail - (fs)->lfs_ravail; \
+ (cip)->avail = (fs)->lfs_avail - (fs)->lfs_ravail - \
+ (fs)->lfs_favail; \
if (((bp)->b_flags & B_GATHERED) == 0) \
(fs)->lfs_flags |= LFS_IFDIRTY; \
(void) LFS_BWRITE_LOG(bp); /* Ifile */ \
@@ -590,7 +594,7 @@
/* Checkpoint region. */
u_int32_t dlfs_freehd; /* 32: start of the free list */
- u_int32_t dlfs_bfree; /* 36: number of free disk blocks */
+ int32_t dlfs_bfree; /* 36: number of free disk blocks */
u_int32_t dlfs_nfiles; /* 40: number of allocated inodes */
int32_t dlfs_avail; /* 44: blocks available for writing */
int32_t dlfs_uinodes; /* 48: inodes in cache not yet on disk */
@@ -750,6 +754,7 @@
pid_t lfs_rfpid; /* Process ID of roll-forward agent */
int lfs_nadirop; /* number of active dirop nodes */
long lfs_ravail; /* blocks pre-reserved for writing */
+ long lfs_favail; /* blocks pre-reserved for writing */
res_t *lfs_resblk; /* Reserved memory for pageout */
TAILQ_HEAD(, inode) lfs_dchainhd; /* dirop vnodes */
TAILQ_HEAD(, inode) lfs_pchainhd; /* paging vnodes */
@@ -767,6 +772,7 @@
int lfs_cleanind; /* Index into intervals */
struct simplelock lfs_interlock; /* lock for lfs_seglock */
int lfs_sleepers; /* # procs sleeping this fs */
+ int lfs_pages; /* dirty pages blaming this fs */
};
/* NINDIR is the number of indirects in a file system block. */
@@ -899,20 +905,34 @@
#endif /* _KERNEL */
/*
- * LFS inode extensions; moved from <ufs/ufs/inode.h> so that file didn't
- * have to change every time LFS changed.
+ * List containing block numbers allocated through lfs_balloc.
+ */
+struct lbnentry {
+ LIST_ENTRY(lbnentry) entry;
+ daddr_t lbn;
+};
+
+/*
+ * LFS inode extensions.
*/
struct lfs_inode_ext {
off_t lfs_osize; /* size of file on disk */
u_int32_t lfs_effnblocks; /* number of blocks when i/o completes */
size_t lfs_fragsize[NDADDR]; /* size of on-disk direct blocks */
- TAILQ_ENTRY(inode) lfs_dchain; /* Dirop chain. */
- TAILQ_ENTRY(inode) lfs_pchain; /* Paging chain. */
+ TAILQ_ENTRY(inode) lfs_dchain; /* Dirop chain. */
+ TAILQ_ENTRY(inode) lfs_pchain; /* Paging chain. */
+ /* Blocks allocated for write */
+#define LFS_BLIST_HASH_WIDTH 17
+ LIST_HEAD(, lbnentry) lfs_blist[LFS_BLIST_HASH_WIDTH];
+#define LFSI_NO_GOP_WRITE 0x01
+ u_int32_t lfs_iflags; /* Inode flags */
};
#define i_lfs_osize inode_ext.lfs->lfs_osize
#define i_lfs_effnblks inode_ext.lfs->lfs_effnblocks
#define i_lfs_fragsize inode_ext.lfs->lfs_fragsize
#define i_lfs_dchain inode_ext.lfs->lfs_dchain
+#define i_lfs_blist inode_ext.lfs->lfs_blist
+#define i_lfs_iflags inode_ext.lfs->lfs_iflags
/*
* Macros for determining free space on the disk, with the variable metadata
@@ -927,7 +947,7 @@
#define LFS_EST_NONMETA(F) ((F)->lfs_dsize - (F)->lfs_dmeta - LFS_EST_CMETA(F))
/* Estimate number of blocks actually available for writing */
-#define LFS_EST_BFREE(F) ((F)->lfs_bfree - LFS_EST_CMETA(F) - (F)->lfs_dmeta)
+#define LFS_EST_BFREE(F) ((F)->lfs_bfree > LFS_EST_CMETA(F) + (F)->lfs_dmeta ? (F)->lfs_bfree - LFS_EST_CMETA(F) - (F)->lfs_dmeta : 0)
/* Amount of non-meta space not available to mortal man */
#define LFS_EST_RSVD(F) (int32_t)((LFS_EST_NONMETA(F) * \
@@ -944,6 +964,13 @@
#define IS_FREESPACE(F, BB) \
(LFS_EST_BFREE(F) >= (BB) + LFS_EST_RSVD(F))
+/*
+ * The minimum number of blocks to create a new inode. This is:
+ * directory direct block (1) + NIADDR indirect blocks + inode block (1) +
+ * ifile direct block (1) + NIADDR indirect blocks = 3 + 2 * NIADDR blocks.
+ */
+#define LFS_NRESERVE(F) (btofsb((F), (2 * NIADDR + 3) << (F)->lfs_bshift))
+
/* Statistics Counters */
struct lfs_stats {
u_int segsused;
@@ -970,11 +997,15 @@
int blkcnt; /* number of blocks */
};
-#define LFCNSEGWAITALL _FCNW_FSPRIV('L', 0, struct timeval)
-#define LFCNSEGWAIT _FCNW_FSPRIV('L', 1, struct timeval)
+#define LFCNSEGWAITALL _FCNR_FSPRIV('L', 0, struct timeval)
+#define LFCNSEGWAIT _FCNR_FSPRIV('L', 1, struct timeval)
#define LFCNBMAPV _FCNRW_FSPRIV('L', 2, struct lfs_fcntl_markv)
#define LFCNMARKV _FCNRW_FSPRIV('L', 3, struct lfs_fcntl_markv)
#define LFCNRECLAIM _FCNO_FSPRIV('L', 4)
+#define LFCNIFILEFH _FCNW_FSPRIV('L', 5, struct fhandle)
+/* Compat for NetBSD 2.x error */
+#define LFCNSEGWAITALL_COMPAT _FCNW_FSPRIV('L', 0, struct timeval)
+#define LFCNSEGWAIT_COMPAT _FCNW_FSPRIV('L', 1, struct timeval)
#ifdef _KERNEL
/* XXX MP */
diff -r 8f86e89ad850 -r ac03fa3eca05 sys/ufs/lfs/lfs_alloc.c
--- a/sys/ufs/lfs/lfs_alloc.c Sat Feb 26 02:57:32 2005 +0000
+++ b/sys/ufs/lfs/lfs_alloc.c Sat Feb 26 05:40:42 2005 +0000
@@ -1,4 +1,4 @@
-/* $NetBSD: lfs_alloc.c,v 1.73 2004/08/14 01:08:03 mycroft Exp $ */
+/* $NetBSD: lfs_alloc.c,v 1.74 2005/02/26 05:40:42 perseant Exp $ */
/*-
* Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -67,7 +67,7 @@
*/
#include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_alloc.c,v 1.73 2004/08/14 01:08:03 mycroft Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_alloc.c,v 1.74 2005/02/26 05:40:42 perseant Exp $");
#if defined(_KERNEL_OPT)
#include "opt_quota.h"
@@ -422,9 +422,7 @@
struct inode *ip;
struct ufs1_dinode *dp;
struct ufsmount *ump;
-#ifdef QUOTA
int i;
-#endif
/* Get a pointer to the private mount structure. */
ump = VFSTOUFS(mp);
@@ -435,6 +433,9 @@
dp = pool_get(&lfs_dinode_pool, PR_WAITOK);
memset(dp, 0, sizeof(*dp));
ip->inode_ext.lfs = pool_get(&lfs_inoext_pool, PR_WAITOK);
+ memset(ip->inode_ext.lfs, 0, sizeof(*ip->inode_ext.lfs));
+ for (i = 0; i < LFS_BLIST_HASH_WIDTH; i++)
+ LIST_INIT(&(ip->i_lfs_blist[i]));
vp->v_data = ip;
ip->i_din.ffs1_din = dp;
ip->i_ump = ump;
diff -r 8f86e89ad850 -r ac03fa3eca05 sys/ufs/lfs/lfs_balloc.c
--- a/sys/ufs/lfs/lfs_balloc.c Sat Feb 26 02:57:32 2005 +0000
+++ b/sys/ufs/lfs/lfs_balloc.c Sat Feb 26 05:40:42 2005 +0000
@@ -1,4 +1,4 @@
-/* $NetBSD: lfs_balloc.c,v 1.48 2004/01/25 18:06:49 hannken Exp $ */
+/* $NetBSD: lfs_balloc.c,v 1.49 2005/02/26 05:40:42 perseant Exp $ */
/*-
* Copyright (c) 1999, 2000, 2001, 2002, 2003 The NetBSD Foundation, Inc.
@@ -67,7 +67,7 @@
*/
#include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: lfs_balloc.c,v 1.48 2004/01/25 18:06:49 hannken Exp $");
+__KERNEL_RCSID(0, "$NetBSD: lfs_balloc.c,v 1.49 2005/02/26 05:40:42 perseant Exp $");
#if defined(_KERNEL_OPT)
#include "opt_quota.h"
@@ -81,6 +81,7 @@
#include <sys/mount.h>
#include <sys/resourcevar.h>
#include <sys/trace.h>
+#include <sys/malloc.h>
#include <miscfs/specfs/specdev.h>
@@ -96,6 +97,8 @@
int lfs_fragextend(struct vnode *, int, int, daddr_t, struct buf **, struct ucred *);
+u_int64_t locked_fakequeue_count;
+
/*
* Allocate a block, and to inode and filesystem block accounting for it
* and for any indirect blocks the may need to be created in order for
@@ -162,6 +165,10 @@
if (bpp)
*bpp = NULL;
+ /* Bomb out immediately if there's no space left */
+ if (fs->lfs_bfree <= 0)
+ return ENOSPC;
+
/* Check for block beyond end of file and fragment extension needed. */
lastblock = lblkno(fs, ip->i_size);
if (lastblock < NDADDR && lastblock < lbn) {
@@ -227,6 +234,10 @@
error = ufs_bmaparray(vp, lbn, &daddr, &indirs[0], &num, NULL, NULL);
if (error)
return (error);
+
+ daddr = (daddr_t)((int32_t)daddr); /* XXX ondisk32 */
+ KASSERT(daddr <= LFS_MAX_DADDR);
+
/*
* Do byte accounting all at once, so we can gracefully fail *before*
* we start assigning blocks.
@@ -295,6 +306,12 @@
if (bpp)
*bpp = bp = getblk(vp, lbn, blksize(fs, ip, lbn), 0, 0);
+ /*
+ * Do accounting on blocks that represent pages.
+ */
+ if (!bpp)
+ lfs_register_block(vp, lbn);
+
/*
Home |
Main Index |
Thread Index |
Old Index