tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Adding pool_cache_invalidate_local() to pool_cache(9) API
On Oct 8, 2009, at 11:52 PM, Jean-Yves Migeon wrote:
>> Maybe I'm missing something ... how is this ever safe to use? An object can
>> be allocated from one CPU's cache and freed to another.
>
> I hardly see how that would be possible. During pool_get/pool_put, the only
> per-CPU pool cache that is manipulated is the current_cpu() one. If one's CPU
> is manipulating the pool_cache of another, bad things will happen anyway, as
> two CPUs could release at the same time in the same pool_cache. Without
> locks, this sounds wrong.
Consider the case of an mbuf. It is allocated from a per-CPU pcg on CPU-A, and
subsequently bounces around the networking stack, through socket buffers, etc.
and is finally freed. But the processor that frees it could be CPU-B (for any
number of legitimate reasons), thus is goes into one of CPU-B's per-CPU pcgs.
> The routine that is never safe is the pool_cache_invalidate_cpu(), when
> called for a CPU different from the current one running. But this function is
> never exposed to the outside world.
>
>> The per-CPU cache is simply an optimization, and it seems very wrong to
>> expose this implementation detail to consumers of the API.
>
> I need a way to invalidate all pool caches, even those that are CPU-bound.
> pool_cache_invalidate() only does that for the global cache, as it cannot
> invalidate CPU caches for CPUs other than its own.
CPU-bound is the wrong term here. Nothing is "bound" to the CPU except for a
set of pcgs that cache constructed objects. The objects are merely kept in
CPU-local caches in order to provide a lockless allocation path in the common
case. The objects themselves are not bound to any one CPU.
> Before invalidate_local(), the only way would be to destroy the pool_cache
> entirely, just to release the objects found in pc_cpus[]. This would cripple
> down the entire VM system, as it cannot work without the L2 shadow page pool.
What do you mean "destroy the pool_cache entirely"? So you're just working
around a bug in pool_cache_invalidate()? If pool_cache_invalidate() is not
also zapping the per-CPU pcgs for that pool_cache, then that is a bug. The
main intent of pool_cache_invalidate() is to nuke any cached constructed copies
of an object if the constructed form were to change for some reason. If
per-CPU cached copies are not included in that, then a subsequent allocation
could return an incorrectly-constructed object, leading to errant behavior.
There are subsystems in the kernel that depend on the pool_cache_invalidate()
semantics I describe; see arch/alpha/alpha/pmap.c:pmap_growkernel() for an
example. The L1 PTPs are cached in constructed form (with the L1 PTEs for the
kernel portion of the address space already initialized). If PTPs are added to
the kernel pmap in such a way as to require an additional L1 PTE to link them
up, then already-constructed-but-free L1 PTPs need to be discarded since they
will be missing part of the kernel's address space.
Sigh, really the per-CPU pcgs should not be "lockless", but rather should be
mutex-proected but "never contended" ... just in case you DO need to manipulate
them from another CPU (this is how Solaris's kmem_cache works, at least it did
at one point).
> In Xen's case, pool_cache_invalidate_local() would be used by each CPU during
> its detach routine (just before its shutdown), to clear all its associated L2
> pages that Xen cannot handle.
So, we've added a huge abstraction violation to the pool API to handle this
special case of Xen shutting down?
I'm sorry, but a different solution needs to be found. Exposing this internal
implementation detail of pools is just wrong.
-- thorpej
Home |
Main Index |
Thread Index |
Old Index