tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: callout_stop => callout_halt



On Mon, Nov 17, 2014 at 4:10 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
> On Mon, Nov 17, 2014 at 2:28 PM, Taylor R Campbell
> <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>>    Date: Mon, 17 Nov 2014 12:52:05 +0900
>>    From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
>>
>>    On Mon, Nov 17, 2014 at 3:00 AM, Taylor R Campbell
>>    <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>>    > My inclination is to nix sme_callout_mtx altogether and replace every
>>    > use of it by sme_mtx (there are obviously some races where one thread
>>    > holds one and another thread holds another and they try to touch the
>>    > same data structure), but I don't know this code well enough to say
>>    > for sure.
>>
>>    Sounds reasonable for me. One concern is that the callout function
>>    calls workqueue_enqueue that enqueues a work of IPL_SOFTCLOCK and
>>    the work also tries to acquire sme_mtx. IIUC, we need to splsoftclock
>>    in the callout function to avoid deadlocks. Is it right?
>>
>> Not quite.  There are a few things going on here...
>>
>> 1. The IPL of the workqueue is the highest IPL from which we are going
>> to queue work, so that the workqueue thread can block interrupts at
>> that level while it grabs work from its queue data structure.  It's
>> safe to call workqueue_enqueue at any IPL below that too.  When we're
>> in the callout, we're at IPL_SOFTCLOCK, so it's OK.
>
> Oh, I didn't know that (callout functions run at IPL_SOFTCLOCK)!
> So I'm convinced to the following discussion.
>
>>
>> 2. When an interrupt handler needs to take a lock that threads might
>> want to use, the threads need to block the interrupt so that it
>> doesn't come in on the same CPU while the thread holds the lock --->
>> deadlock.  It's not the interrupt handler that needs to spl*; it's
>> everyone who *else* takes the lock.  That's why we pass an IPL to
>> mutex_init: so mutex_enter can raise to it and mutex_exit can restore
>> the IPL.  (It is ~never the case that you need to write explicit calls
>> to spl* in modern APIs.)
>
> Yes.
>
>>
>> 3. Presumably that's why sme_callout_mtx exists -- I hadn't noticed
>> before that it was at IPL_SOFTCLOCK while sme_mtx is at IPL_NONE.
>> (That's not enough, though: the locking looks broken to me because not
>> all the envsys code is consistent for each data structure about
>> whether you need sme_mtx or sme_callout_mtx.)
>>
>> 4. However, for the purposes of mutex(9), IPL_NONE and all the
>> IPL_SOFT* are equivalent.  So in this case it almost certainly doesn't
>> matter in practice, and it's probably no worse than the rest of the
>> system to just use sme_mtx with IPL_NONE for now...
>
> I see, though I'm not sure I understand it completely
>
>> although on
>> discussion with mrg@, it sounds like there are some pretty deep bugs
>> in the way that softints/callouts and blocking work, which can be the
>> subject of a separate thread.  Argh!
>
> ...so I'm looking forward to the new thread :)
>
>>
>>    So the patch is updated: http://www.netbsd.org/~ozaki-r/callout_halt.diff
>>
>> I know you didn't introduce this, but while I'm looking at that I
>> realize that some of the code you're touching is calling *_destroy
>> routines under a lock, which is usually not OK.  callout_destroy is an
>> exception that happens to be OK, but, e.g., workqueue_destroy may wait
>> for work in progress to complete, and kmem_free may call into uvm and
>> wait for things inside that.
>
> Okay, I'm trying to get rid of them that require some rearrangements.

Here is a patch: http://www.netbsd.org/~ozaki-r/workqueue_destroy.diff .
The patch is applied on top of
http://www.netbsd.org/~ozaki-r/callout_halt.diff .

I wish we could call workqueue_destroy within a mutex...

  ozaki-r


Home | Main Index | Thread Index | Old Index