tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: XIP
On Mon, Oct 25, 2010 at 02:09:43AM +0900, Masao Uebayashi wrote:
> I think the uebayasi-xip branch is ready to be merged.
hi,
here's what I found looking at the current code in the branch:
- the biggest issue I had with the version that I reviewed earlier
was that it muddled the notion of a "managed" page. you wanted
to create a new kind of partially-managed page for XIP devices
which would not be managed by the UVM in the sense that it could
contain different things depending on what was needed but only that
the page's mappings would be tracked so that pmap_page_protect()
could be called on it. this is what led to all the pmap changes
the pmaps needed to be able to handle being called with a vm_page
pointer that didn't actually point to a struct vm_page.
it looks like you've gotten rid of that, which I like, but you've
replaced it with allocating a full vm_page structure for every page in
an XIP device, which seems like a waste of memory. as we discussed
earlier, I think it would be better to treat XIP pages as unmanaged
and change a few other parts of UVM to avoid needing to track the
mappings of XIP page mappings. I have thoughts on how to do all that,
which I'll list at the end of this mail. however, if XIP devices
are small enough that the memory overhead of treating device pages
as managed is reasonable, then I'll go along with it.
so how big do these XIP devices get?
either way, the changes to the various pmaps to handle the fake vm_page
pointers aren't necessary anymore, so all those changes should be reverted.
it doesn't look like the PMAP_UNMANAGED flag that you added before
is necessary anymore either, is that right? if so, it should also
be reverted. if not, what's it for now?
if we do keep the vm_page structures for device pages, I think
it would be better to put them in the same vm_physseg array as
the normal memory rather than having a parallel set of functions
and data structures for device pages. device pages would just be
loaded with avail_start and avail_end indicating that none of the pages
should be added to the freelist. is there any other way that
device pages need to be treated differently from normal pages
as far as their physseg information is concerned?
- I mentioned before that I think the driver for testing this using
normal memory to simulate an XIP device should just be the existing
"md" driver, rather than a whole new driver whose only purpose
would be testing the XIP code. however, you wrote a separate
"xmd" driver anyway. so I'll say again: I think the "xmd" should be
merged into the "md" driver.
you also have an "xmd_machdep.c" for various platforms, but nothing
in that file is really machine-specific. rather than use the
machine-specific macros for converting between bytes and pages,
it would be better to either use the MI macros (ptoa / atop)
or just shift by PAGE_SHIFT, so that there doesn't need to be
extra code written for each platform.
- any function that might be called from a kernel module should always exist,
regardless of which kernel options are enabled. the particular one
that I noticed is uvm_page_physload_device(). in general, it seems to me
that if this were better intergrated then it wouldn't be worthwhile having
an XIP kernel option at all, since there would be very little code to omit.
- trivial amounts of code in kernel modules shouldn't be compile-time
conditional either, so that there don't need to be separate binaries
for the module with and without the option. the various XIP bits in xmd.c
fall into this category.
- as we discussed before, the addition of bus_space_physload() and related
interfaces seems very strange to me, especially since the two implementations
you've added both do exactly the same thing (call bus_space_mmap() on the
start and end of the address range, and then call uvm_page_physload()
with the resulting paddr_t values). is there any reason this can't be
a MI function? also, the non-device versions of these are unused.
- in many files the only change you made was to replace the include of
<uvm/uvm_extern.h> with <uvm/uvm.h>, what's the reason for that?
- why is the "majors" entry for the "flash" driver in the MD majors files?
isn't this an MI driver?
- in flash.c, most of the body of flash_init() is ifdef'd out,
what's up with that?
- we talked before about removing the "xip" mount option and enabling
this functionality automatically when it's available, which you did
but then recently changed back. so I'll ask again,
why do we need a mount option?
- as we've discussed before, I think that the XIP genfs_getpages()
should be merged with the existing one before you merge this into
-current. merging it as-is would make a mess, and it's better to
avoid creating messes than than to promise to clean them up later.
we've probably already spent more time arguing about it than
it would have taken you to do it.
- similarly, I don't think that we should have a separate XIP putpages.
why doesn't the existing genfs_do_putpages() work for XIP pages as-is?
- in getpages, rather than allocating a page of zeros for each file,
it would be better to use the same page of zeroes that mem.c uses.
if multiple subsystems need a page of zeroes, then I think that
UVM should provide this functionality rather than duplicating
that all over the place. if you don't want to update all the copies
of mem.c, that's fine, but please put this in UVM instead of genfs.
- I don't remember if I asked about this before: is there some reason
for there to be XIP code in the fs-specific code? couldn't this be
done in the VFS and vnode code instead so that it works for all
file system types instead of just FFS?
- as you guessed in your mail, I don't like the new PQ_FIXED flag either.
it looks like it's only used for preventing the XIP pages from
being put on the paging queues, but that would be better done by
just initializing pg->wire_count to 1 so that UVM treats them
as permanently non-pageable (which they actually are).
that's all I noticed in the current version of the code.
I haven't yet gone back and looked at my previous comments to see if
I overlooked anything this time around, I'll do that tomorrow.
needless to say, I don't think this work is ready to be merged yet.
now here's the explanation I promised for how to treat XIP pages
as unmanaged instead of managed.
first, some background for other people who don't know all this:
the only reason that treating XIP pages as managed pages is
relevant at all is because of the AMAP_SHARED flag in UVM,
which allows anonymous memory to be shared between processes
such that the changes made by one process are seen by the other.
this impacts XIP pages (which are not anonymous) because a
PROT_WRITE, MAP_PRIVATE mapping of an XIP vnode should point to
the XIP pages as long as all access to the mapping is for reads,
but when the mapping is written to then the XIP page should be
copied to an anonymous page (the normal COW operation) but that
new anonymous page should still be shared between all processes
that are sharing the AMAP_SHARED mapping. to force those other
processes to take another page fault the next time they access
their copy of the mapping (which we need to do so that they will
start accessing the new anonymous page instead of the XIP page),
we must invalidate all the other pmap entries for the XIP page,
which we do by calling pmap_page_protect() on it. the pmap layer
tracks all the mappings of the page and thus it can find them all.
there are several ways that the AMAP_SHARED functionality is used,
and unfortunately they would need to changed in different ways to
make this work for XIP pages without needing to track mappings:
- uvm_io(), which copies data between the kernel or current process
and an arbitrary other process address space.
currently this works by sharing the other address space with
the kernel via uvm_map_extract() and then just using uiomove()
to transfer the data. this could be done instead by using part
of the uvm_fault() code to find the physical page in the other
address space that we want to access and lock it (ie. set PG_BUSY),
map the page into the kernel (perhaps using uvm_pager_mapin()),
transfer the data, then unmap the page and unlock it.
- uvm_mremap(), which resizes an existing mapping.
this uses uvm_map_extract() internally, which uses AMAP_SHARED,
but the mremap operation doesn't actually need the semantics of
AMAP_SHARED since as mremap doesn't create any additional mappings
as far as applications are concerned. the usage of AMAP_SHARED
is just a side-effect of the current implementation, which bends
over backward to call a bunch of existing higher-level functions
rather than doing something more direct (which would be simpler
and more efficient as well).
- MAP_INHERIT_SHARE, used to implement minherit().
this is the one that is the most trouble, since it's what AMAP_SHARED
was invented for. however, it's also of least importance since
some searching with google finds absolutely no evidence of
any application actually using it, just lots of copies of
the implementations and manpages for various operating systems.
with that in mind, there are several ways this could be handled.
(1) just drop support for minherit() entirely.
(2) reject attempts to set MAP_INHERIT_SHARE on XIP mappings.
(3) copy XIP pages into normal anon pages when setting
MAP_INHERIT_SHARE.
(4) copy XIP pages into normal vnode pages when setting
MAP_INHERIT_SHARE. this would mean that the getpages
code would need to look in the normal page cache
before using XIP pages. I think this option would also
need getpages to know about the inherit flag to
correctly handle later faults on XIP mappings,
and there are probably other sublte complications.
of these choices, (2) sounds like the best compromise to me.
this approach would also bring back some issues where our previous
discussion went around in circles, such as callers of VOP_GETPAGES()
wanting vm_page pointers but XIP pages not having them, but
I'll leave that additional discussion for future email if necessary.
I'm just going over all this now so that it's clear to everyone
that this kind of approach is possible if the memory overhead of
the full vm_page structures for XIP pages is deemed too high.
-Chuck
Home |
Main Index |
Thread Index |
Old Index