Subject: Re: vnode locking problem
To: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 03/24/1999 08:47:21
Bill Sommerfeld writes:
> > that should do well enough for now.
> > it's certainly better than hanging or panicing.
> > when testing, you should try the write-into-mapping case as well
> > as the read-into-mapping case that I described in the PR.
>
> Yup, will do.
>
> Random other thoughs:
>
> - it's clear that this is easier to fix in a unified-buffer-cache
> setup, where VOP_READ gets replaced with "map vnode into kernel VA
> space, copyout, maybe unmap vnode", *assuming* that the copyout is
> done while the syscall has the vnode unlocked.
we can't unlock the vnode during the copyout because of POSIX write
semantics. write() must be atomic vs read() and other write()s.
the real solution is to lock different aspects of vnode access with
different locks so that we can read via mappings during VOP_WRITE().
> - it also feels "wrong" that a copyout which replaces the entire
> contents of a page which isn't present first has to bring that page in
> from backing store. (this isn't always going to be the case for
> situations like this, but it may often be).
I've addressed this already in UBC. you might like to check out the
chs-ubc branch and see what I've done. I've got some design info
written up that I should just put in the tree. I'll do that later today.
> - Fixing the vnode locking protocols so that reads could be done with
> a shared lock would be desirable (particularly for MP scalability) but
> could be tricky. (you still need an exclusive lock on the file
> pointer, but that's a separate issue). I've been thinking about a
> related issue (using shared locks when possible for VOP_LOOKUP to
> minimize how badly things lose when a filesystem gets unresponsive)
> and may look into this as well after 1.4..
yea, the current business of taking an exclusive lock for every VOP
is really silly. the whole vnode locking scheme needs to be redone
to use locks in shared mode where possible.
> - the upper-layer code could possibly notice this case of EFAULT,
> unlock everything, touch the pages, relock, and retry the read, but
> knowing when it's safe to redo the read is not immediately clear, and
> this just feels wrong..
... and it is! :-)
> - the comment you had in the PR about the problem also occurring when
> the buffer should have been in BSS: I believe that linkers typically
> place bss immediately after initialized data, without rounding up to
> the next page... so the first part of the buffer was probably still in
> .data and backed by the vnode..
>
> - Bill
-Chuck