Subject: Re: Filesystem locking and cache question.
To: None <wrstuden@netbsd.org>
From: Sung-Won Chung <swchung7@hotmail.com>
List: tech-kern
Date: 01/18/2003 13:50:05
>From: Bill Studenmund <wrstuden@netbsd.org>
>
> > 1. Locking
> >
> > During a relocation of a disk block, other processes should not
> > access that block. I think a simple solution is using a lock for
> > a vnode related with a block under relocation, since we can find
> > the inode corresponding to a block under relocation, though it
> > should traverse file system. This lock is different from vn_lock,
> > since it should protect the whole vnode operations such as
> > VOP_RENAME().
>
>I don't understand what is wrong with using a vnode lock? Just lock the
>node while you're moving the blocks.
For FFS, if vn_lock is used, I thought there is chance that VOP_RENAME
may notice the inode of source directory changed during internal
re-locking. Because I`m just a beginner in file system's internal,
I didn't know that VOP_RENAME avoides this situation by
setting IN_RENAME flag, and I can also check it.
I think that there are some race conditions that can not be avoided
by vn_lock(). In vnode operations that call ffs_makeinode()
such as VOP_MKNOD/MKDIR/CREATE() return a locked vnode for a
created file or dir. If an inode is relocated between after it is
allocated by ffs_nodealloc() and before registered by VFS_VGET(),
we can't lock the vnode corresponding to the inode under relocation.
Then the relocation program moves the inode without vnode locking,
and its content is lost.
Another possible race is in VOP_LOOKUP(). When an inode is moved
between reading directory entry and calling VFS_GET(),
When this race condition is possible, VOP_LOOKUP() sets PDIRUNLOCK
flag to inform caller before returning error. However,
curent vfs_lookup() implementation doesn't seem to deal with it..
>There are a number of tricks with UBC that you could play too.
The only interface I know about UBC is ubc_alloc/ubc_release.
I have no idea how to lock with this interface.
Could you show me a little more hint ?
> > 2. Cache
> >
> > FFS uses inode, vnode, and buffer cache. After a block is relocated,
> > we should update caches related with the block just moved, before
> > releasing a lock that have prevented enterance of vnode operations
> > related with the block under relocation.
> >
> > Simple solution is, instead of update, 1) flush buffer cache related
> > with the moved block, and 2) flush inode cache related with the moved
> > block, since they have old location of the block.
>
>What do you mean, "instead of update?"
I'm sorry if I confused you. I am not good at English.
I meant "update" by correcting the content of buffer cache or
inode cache that had the previous location of a block which
had moved to a new location.
> > If we avoid cache flushing, the work to be done for inode is simple.
> > We just update block pointers (di_db or di_ib). However, avoiding the
> > flush of a buffer cache is rather complicated. 1) if the buffer cache
> > contains general data block, we can avoid flush only by changing
> > physical block number in the buffer (b_blkno). 2) if the buffer cache
> > contains directory entry, we can avoid flush by changing inode number
> > field in the relavant directory entry. 3) if the buffer cache contains
> > inode itself, we change file system block pointers, 4) if the buffer
> > cache contains a block containing indirect file system block pointers,
> > we updates some of those pointers to reflect new location of moved
> > block.
>
>Why do we want to not synchronize the disk and the buffer cache?
I thought if we synchoronize by flushing invalid cache, the frequently
used part of cache may need to be reloaded soon again.
I admit that I was too greedy not to lose cache.
> > The difficult to implement this idea is, current buffer cache doesn't
> > know what kind of data does it have. But adding flags that can
> > tell what the buffer has may degrade the file system indepedency of
> > buffer cache.
>
>Look at LFS. It routinely moves data blocks around, and so it will show
>you how to do this.
Thank you much for your considerations and suggestions.
I'll study the LFS code to see how they solve my problems.
Best Regrads,
- Sungwon
_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963