tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: AMAP_SHARED (was Re: XIP)
On Mon, Nov 01, 2010 at 02:32:23AM +0900, Masao Uebayashi wrote:
> Hmmmmm. I may be missing something seriously. I think I've
> understood the following description you wrote, and I got hit this
> in the very early development stage ... >1 year ago. It was
> uvm_fault() -> uvmfault_promote() -> amap_add() -> pmap_page_protect()
> Since then I've assumed that shared amap is a pretty much common
> thing... Now I realize it should not be, as you describe. Worse,
> I can't reproduce that code path...
>
> Now I have to *really* understand how this works...
I still have no idea how I was hit by this. Strange.
Considering how AMAP_SHARED is used, especially it's backed by
vnode, this is only for highly tuned server/client programs which
share initialized data (.data). While such a use case may have
some value, I'd say this is a rare feature.
I append a test program.
>
> (I'll respond about this topic again later.)
>
> Masao
>
> On Tue, Oct 26, 2010 at 02:06:38PM -0700, Chuck Silvers wrote:
> (snip)
> > now here's the explanation I promised for how to treat XIP pages
> > as unmanaged instead of managed.
> >
> > first, some background for other people who don't know all this:
> > the only reason that treating XIP pages as managed pages is
> > relevant at all is because of the AMAP_SHARED flag in UVM,
> > which allows anonymous memory to be shared between processes
> > such that the changes made by one process are seen by the other.
> > this impacts XIP pages (which are not anonymous) because a
> > PROT_WRITE, MAP_PRIVATE mapping of an XIP vnode should point to
> > the XIP pages as long as all access to the mapping is for reads,
> > but when the mapping is written to then the XIP page should be
> > copied to an anonymous page (the normal COW operation) but that
> > new anonymous page should still be shared between all processes
> > that are sharing the AMAP_SHARED mapping. to force those other
> > processes to take another page fault the next time they access
> > their copy of the mapping (which we need to do so that they will
> > start accessing the new anonymous page instead of the XIP page),
> > we must invalidate all the other pmap entries for the XIP page,
> > which we do by calling pmap_page_protect() on it. the pmap layer
> > tracks all the mappings of the page and thus it can find them all.
> >
> > there are several ways that the AMAP_SHARED functionality is used,
> > and unfortunately they would need to changed in different ways to
> > make this work for XIP pages without needing to track mappings:
> >
> > - uvm_io(), which copies data between the kernel or current process
> > and an arbitrary other process address space.
> > currently this works by sharing the other address space with
> > the kernel via uvm_map_extract() and then just using uiomove()
> > to transfer the data. this could be done instead by using part
> > of the uvm_fault() code to find the physical page in the other
> > address space that we want to access and lock it (ie. set PG_BUSY),
> > map the page into the kernel (perhaps using uvm_pager_mapin()),
> > transfer the data, then unmap the page and unlock it.
> >
> > - uvm_mremap(), which resizes an existing mapping.
> > this uses uvm_map_extract() internally, which uses AMAP_SHARED,
> > but the mremap operation doesn't actually need the semantics of
> > AMAP_SHARED since as mremap doesn't create any additional mappings
> > as far as applications are concerned. the usage of AMAP_SHARED
> > is just a side-effect of the current implementation, which bends
> > over backward to call a bunch of existing higher-level functions
> > rather than doing something more direct (which would be simpler
> > and more efficient as well).
> >
> > - MAP_INHERIT_SHARE, used to implement minherit().
> > this is the one that is the most trouble, since it's what AMAP_SHARED
> > was invented for. however, it's also of least importance since
> > some searching with google finds absolutely no evidence of
> > any application actually using it, just lots of copies of
> > the implementations and manpages for various operating systems.
> >
> > with that in mind, there are several ways this could be handled.
> > (1) just drop support for minherit() entirely.
> > (2) reject attempts to set MAP_INHERIT_SHARE on XIP mappings.
> > (3) copy XIP pages into normal anon pages when setting
> > MAP_INHERIT_SHARE.
> > (4) copy XIP pages into normal vnode pages when setting
> > MAP_INHERIT_SHARE. this would mean that the getpages
> > code would need to look in the normal page cache
> > before using XIP pages. I think this option would also
> > need getpages to know about the inherit flag to
> > correctly handle later faults on XIP mappings,
> > and there are probably other sublte complications.
> >
> > of these choices, (2) sounds like the best compromise to me.
> >
> >
> > this approach would also bring back some issues where our previous
> > discussion went around in circles, such as callers of VOP_GETPAGES()
> > wanting vm_page pointers but XIP pages not having them, but
> > I'll leave that additional discussion for future email if necessary.
So you're OK to address this AMAP_SHARED mess after the merge.
I'll come back these interesting topics later.
Masao
> >
> > I'm just going over all this now so that it's clear to everyone
> > that this kind of approach is possible if the memory overhead of
> > the full vm_page structures for XIP pages is deemed too high.
> >
> >
> > -Chuck
>
> --
> Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
--
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
/* $Id$ */
#include <err.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/cdefs.h>
#include <sys/mman.h>
#define SIZE 4096 /* PAGE_SIZE */
char buf_a[SIZE] __aligned(SIZE) = { [0] = 1 };
char buf_b[SIZE] __aligned(SIZE) = { [0] = 1 };
int
main(int ac, char *av[])
{
int error = 0;
pid_t pid;
int status;
error = minherit(buf_b, SIZE, MAP_INHERIT_SHARE);
if (error != 0)
err(1, "minherit(MAP_INHERIT_SHARE)");
if ((pid = fork()) < 0)
err(1, "fork");
switch (pid) {
case 0:
sleep(1);
printf("buf_a[0] = %d\n", buf_a[0]);
printf("buf_b[0] = %d\n", buf_b[0]);
break;
default:
buf_a[0] = -1;
buf_b[0] = -1;
(void)wait(&status);
break;
}
return 0;
}
Home |
Main Index |
Thread Index |
Old Index