Re: Fixing pool_cache_invalidate(9), final step

To: jeanyves.migeon%free.fr@localhost
Subject: Re: Fixing pool_cache_invalidate(9), final step
From: Lars Heidieker <lars.heidieker%googlemail.com@localhost>
Date: Wed, 30 May 2012 22:43:14 +0200

On Wed, May 30, 2012 at 6:49 PM, Jean-Yves Migeon
<jeanyves.migeon%free.fr@localhost> wrote:
> On Tue, 29 May 2012 23:29:10 +0200, Lars Heidieker wrote:
>>
>> On Tue, May 29, 2012 at 9:03 PM, Jean-Yves Migeon <jym%netbsd.org@localhost> 
>> wrote:
>>>
>>> -current: I would like to take a more radical step and perform the
>>> xcall(9) directly from pool_cache_invalidate(9). Given that xcall(9)
>>> broadcasts cannot be sent from interrupt or softint context (because of
>>> the use of condvars), this change is likely to expose bugs in code paths
>>> where invalidation is handled from interrupt context (which is a bad
>>> thing IMHO). See current-pcache-invalidate.diff.
>>>
>> during looking at the patch (current version one) I came across one
>> pitfall (if I'm not mistaken) pool_cache_invalidate_cpu will release
>> the per cpu pool_cache_cpu_t struct which only should happen if the
>> cache is about to be destroyed.
>
>
> Correct! And curiously this never lead to a hard failure on my side, any
> get/put to the pool should make it crash very early (when accessing per-CPU
> caches).
>
> That confirms that my current testing is limited. Hmm.
>

Those pool_cache_cpu struct don't have any active objects, and as long
as the pcachcpu pool in't drained things might seem fine....

>
>> the problem lies in that pool_cache_invalidate_groups releases
>> pool_cache_groups during normal pool_cache_drain activity but
>> pool_cache_invalidate_cpu is intended for getting the pool_cache ready
>> for destruction.
>> calling pool_cache_invalidate_cpu via xcall from pool_drain =>
>> pool_cache_invalidate is problematic.
>
>
> Indeed. The issue lies in the last part of pool_cache_invalidate_cpu()
> though, not putting back the pool_cache_cpu back into the available pool
> could fix this. Perhaps this could be split in two different routines then?
>
> pool_cache_cpu_destroy(...) {
>     pool_cache_invalidate_cpu(...);
>
>     if (cc != &pc->pc_cpu0)
>         pool_put(&cache_cpu_pool, cc);;
> }
>
> A bit like pool_cache_destroy()?
>
> This should be done for -6 and -current though, this reason is valid in both
> situations.
>

Well, if I read the patch right the transfer_xcall will be called from
pool_drain_start and be waited for in pool_drain_end and therefor
calling the other xcall in pool_cache_invalidate that will release the
per cpu cached objects directly into the pool will most likely have no
work to do.
One question puzzles me right now, isn't it more efficient to do the
cache group swapping then to release the cpu locally cached objects
directly to the pool?
Having the cpu cached objects pushed to the pool_cache global cache
and then draining it on the calling cpu seems quite efficient to me.
Just an idea (motivated by the fact that the split of pool_drain is
ugly imho as it exposes an internal state of the draining, the xcall
arg) merge pool_drain_start and pool_drain_end and remove the xcall
from it and do the xcall from pool_cache_invalidate and probably give
it an argument if to wait aka doing it synchronous.
As all cpus might have caches to release not much time will be wasted
and the will content the pool_cache global cache which should be
quicker then the contenting for the underlying pool (I haven't
checked...)
The "normal" drain call from the pagedaemon might be done async (so we
don't wait for a cpu busy with interrupts in that case which might
happen with low priority xcalls and which is one of your concerns if I
got it right).
Except for the async idea, I had this running with high-priority
xcalls localy...

>
>> for the pool_cache destruction case we have to make sure that any
>> pool_cache_invalidate with it's pool_cache_invalidate_groug calls and
>> probably xcall that do the cpu local cache to pool_cache global
>> transfer have finished before pool_cache_invalidate_cpu is called.
>> if pool_cache_xcall is called from pool_drain the we could rejoin
>> pool_drain_start and pool_drain_end....
>
>
> The synchronous pool_cache_invalidate_<whatever> would fit such a goal, no?
>
this should be covered with giving pool_cache_invalidate an arg.

>
>> if my grepping is right pool_reclaim/pool_cache_reclaim are only user
>> in subr_pool.c (for _HARDKERNEL) so draining should indeed only happen
>> from the pageout daemon and not from interrupt context at all, so we
>> should (if I'm not mistaken) be able to simplify be integrating
>> pool_drain_start/end and calling the pool_cache_xcall from
>> pool_cache_invalidate...
>> currently with the split pool_drain_start/end the time the xcall runs
>> might on SMP be used to drain the buffer pool, the only different that
>> comes to my mind quickly.
>
>
> I came to the same conclusion while looking at the drain_start/_end
> functions, but they were outside the scope of my patch (I expected
> oppositions, so kept it as small as possible).
>
I think this comes as a package ;-)

Just some thoughts,
Lars

-- 
Mystische Erklärungen:
Die mystischen Erklärungen gelten für tief;
die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.
   -- Friedrich Nietzsche
   [ Die Fröhliche Wissenschaft Buch 3, 126 ]

Follow-Ups:
- Re: Fixing pool_cache_invalidate(9), final step
  - From: Jean-Yves Migeon

References:
- Fixing pool_cache_invalidate(9), final step
  - From: Jean-Yves Migeon
- Re: Fixing pool_cache_invalidate(9), final step
  - From: Lars Heidieker
- Re: Fixing pool_cache_invalidate(9), final step
  - From: Jean-Yves Migeon

Prev by Date: Re: Severe netbsd-6 NFS server-side performance issues
Next by Date: Re: Severe netbsd-6 NFS server-side performance issues
Previous by Thread: Re: Fixing pool_cache_invalidate(9), final step
Next by Thread: Re: Fixing pool_cache_invalidate(9), final step
Indexes:

Home | Main Index | Thread Index | Old Index