Subject: Re: uvm_loan(), page daemon, pmap_kenter
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 04/06/1999 17:20:57
Jason Thorpe writes:
> Hi folks...
>
> So, I've been thinking of using uvm_loan() for a few things, including
> zero-copy transmit path of network connections (receive is a little
> harder, and I will think more about that later :-).
>
> The major problem with uvm_loan() as I see it is that it requires PMAP_NEW
> in order to work properly if you're loaning pages to the kernel. (It will
> function without it, just not correctly, specifically, when memory is scarce.)
how does !PMAP_NEW break things?
(we should just finish implementing PMAP_NEW everywhere anyway.)
> I discussed this briefly w/ Chuck, who outlined an example of why it's
> required:
>
> 1. User loans page from vnode object to kernel (for an mbuf,
> for example).
>
> 2. Pages get scarce, page daemon starts scanning active pages.
>
> 3. Page daemon decides to deactivate the page (can happen to active
> pages if there is an inactive shortage). This causes a
> pmap_page_protect(..., VM_PROT_NONE) to happen on the page.
>
> 4. Because of the way loaning works (i.e. the kernel loan mappings
> aren't "real", because that would cause really painful map
> fragmentation), the kernel will die when it faults on the
> revoked mapping, because it can't look up the object that
> the page belongs to.
>
> As I see it, the problem is step #3. When a page is loaned to the kernel,
> the kernel mapping is wired. This should cause wire_count to increase for
> the page. Now, take a look at the following code block from uvmpd_scan():
>
> /*
> * deactivate this page if there's a shortage of
> * inactive pages.
> */
> if (inactive_shortage > 0) {
> pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
> /* no need to check wire_count as pg is "active" */
> uvm_pagedeactivate(p);
> uvmexp.pddeact++;
> inactive_shortage--;
> }
>
> I think that block is code is bogus. If you look at uvm_pagedeactivate(),
> you'll note that it expects the caller to make sure the page is not wired.
>
> In fact, uvm_pagedeactivate() includes a DIAGNOSTIC panic for the case
> where wire_count is != 0! HOWEVER, that case is not triggered here
> because that path is NOT taken if the page was on the active list, which
> is the case here in uvmpd_scan().
>
> Wired pages, by definition, should not take faults, and given that the page
> was loaned to a wired kernel mapping, the page should not take a fault
> in the kernel. Therefore, I think that uvmpd_scan() should not deactivate
> pages that are wired.
>
> I think the following diff is all that's needed to solve this case:
>
> Index: uvm_pdaemon.c
> ===================================================================
> RCS file: /cvsroot/src/sys/uvm/uvm_pdaemon.c,v
> retrieving revision 1.15
> diff -c -r1.15 uvm_pdaemon.c
> *** uvm_pdaemon.c 1999/03/30 10:12:01 1.15
> --- uvm_pdaemon.c 1999/04/06 19:05:19
> ***************
> *** 1093,1103 ****
>
> /*
> * deactivate this page if there's a shortage of
> ! * inactive pages.
> */
> ! if (inactive_shortage > 0) {
> pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
> - /* no need to check wire_count as pg is "active" */
> uvm_pagedeactivate(p);
> uvmexp.pddeact++;
> inactive_shortage--;
> --- 1093,1102 ----
>
> /*
> * deactivate this page if there's a shortage of
> ! * inactive pages and it's not wired.
> */
> ! if (inactive_shortage > 0 && p->wire_count == 0) {
> pmap_page_protect(PMAP_PGARG(p), VM_PROT_NONE);
> uvm_pagedeactivate(p);
> uvmexp.pddeact++;
> inactive_shortage--;
>
> Comments on this part?
wired pages should not be on the active queue (or any paging queue).
uvm_pagewire() deals with this.
> I do think, however, that there is a more general issue to be dealt with.
>
> What to do with pmap_page_protect(..., VM_PROT_NONE) when there are
> wired mappings for a page? Some pmaps currently skip revoking a mapping
> (revoking, not removing ... pmap_remove() always works :-) if the mapping
> is wired (usually indicated by a software PTE bit). Here's an example
> from the hp300 pmap:
>
> /*
> * pmap_page_protect: [ INTERFACE ]
> *
> * Lower the permission for all mappings to a given page to
> * the permissions specified.
> */
> void
> pmap_page_protect(pa, prot)
> paddr_t pa;
> vm_prot_t prot;
> {
> struct pv_entry *pv;
> int s;
>
> .
> .
> .
>
> /* [VM_PROT_NONE case] */
> pv = pa_to_pvh(pa);
> s = splimp();
> while (pv->pv_pmap != NULL) {
> pt_entry_t *pte;
>
> pte = pmap_pte(pv->pv_pmap, pv->pv_va);
> #ifdef DEBUG
> if (!pmap_ste_v(pv->pv_pmap, pv->pv_va) ||
> pmap_pte_pa(pte) != pa)
> panic("pmap_page_protect: bad mapping");
> #endif
> if (!pmap_pte_w(pte))
> pmap_remove_mapping(pv->pv_pmap, pv->pv_va,
> pte, PRM_TFLUSH|PRM_CFLUSH);
> else {
> pv = pv->pv_next;
> #ifdef DEBUG
> if (pmapdebug & PDB_PARANOIA)
> printf("%s wired mapping for %lx not removed\n",
> "pmap_page_protect:", pa);
> #endif
> if (pv == NULL)
> break;
> }
> }
> splx(s);
> }
>
> I think this is correct behavior, and in some sense, would prevent part
> of this problem. All pmaps should probably be modified in this way (actually,
> it might be nice to revisit the pmap_page_protect() interface and change it
> into two functions: pmap_copy_on_write() [which is what it used to be, to
> handle the VM_PROT_READ case] and pmap_revoke() [to handle the VM_PROT_NONE
> case]). XXX Must consider VM_PROT_EXEC for systems where the I-cache must
> be flushed, e.g. Alpha. Should instrument pmap_page_protect() to see if
> VM_PROT_EXEC is ever actually passed to that function.
the problem is that pmap_page_protect() is not well-defined for wired pages.
I would argue that pmap_page_protect() should never be called on a wired page,
and that pmap_page_protect() should panic if that happens. this assertion
probably doesn't play well with the current loaning code, but I think it would
be a good direction to head.
copy-on-write and wired pages don't mix. wiring a COW page should force
an immediate copy, as should COWing a wired page. so if pmap_page_protect()
is split, the pmap_copy_on_write() variety should break the COW for wired
pages and the pmap_revoke() variety should panic for wired pages.
> But another thing to consider is what happens when you mark a page as
> copy-on-write while it's loaned to the kernel. I haven't thought about
> all of those issues yet... It's certainly wrong to not revoke write perms
> on a page if it's being marked COW, and pmap_kenter gets this wrong. I
> guess loaned pages are meant to be read-only, yes?
COW and loaning don't mix either, at least not the loan-to-kernel form.
COW depends on the page not being wired, and loan-to-kernel depends on
the page being wired. can't have both at the same time.
the other forms of loaning could probably be mixed with COW mappings.
loans are basically a fancy form of COW which allow sharing of pages
between different types of vm constructs (uvm_objects, anons, wired
kernel pages).
-Chuck (the other)
> -- Jason R. Thorpe <thorpej@nas.nasa.gov>