tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: callout_stop => callout_halt
On Mon, Nov 17, 2014 at 2:28 PM, Taylor R Campbell
<campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
> Date: Mon, 17 Nov 2014 12:52:05 +0900
> From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
>
> On Mon, Nov 17, 2014 at 3:00 AM, Taylor R Campbell
> <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
> > My inclination is to nix sme_callout_mtx altogether and replace every
> > use of it by sme_mtx (there are obviously some races where one thread
> > holds one and another thread holds another and they try to touch the
> > same data structure), but I don't know this code well enough to say
> > for sure.
>
> Sounds reasonable for me. One concern is that the callout function
> calls workqueue_enqueue that enqueues a work of IPL_SOFTCLOCK and
> the work also tries to acquire sme_mtx. IIUC, we need to splsoftclock
> in the callout function to avoid deadlocks. Is it right?
>
> Not quite. There are a few things going on here...
>
> 1. The IPL of the workqueue is the highest IPL from which we are going
> to queue work, so that the workqueue thread can block interrupts at
> that level while it grabs work from its queue data structure. It's
> safe to call workqueue_enqueue at any IPL below that too. When we're
> in the callout, we're at IPL_SOFTCLOCK, so it's OK.
Oh, I didn't know that (callout functions run at IPL_SOFTCLOCK)!
So I'm convinced to the following discussion.
>
> 2. When an interrupt handler needs to take a lock that threads might
> want to use, the threads need to block the interrupt so that it
> doesn't come in on the same CPU while the thread holds the lock --->
> deadlock. It's not the interrupt handler that needs to spl*; it's
> everyone who *else* takes the lock. That's why we pass an IPL to
> mutex_init: so mutex_enter can raise to it and mutex_exit can restore
> the IPL. (It is ~never the case that you need to write explicit calls
> to spl* in modern APIs.)
Yes.
>
> 3. Presumably that's why sme_callout_mtx exists -- I hadn't noticed
> before that it was at IPL_SOFTCLOCK while sme_mtx is at IPL_NONE.
> (That's not enough, though: the locking looks broken to me because not
> all the envsys code is consistent for each data structure about
> whether you need sme_mtx or sme_callout_mtx.)
>
> 4. However, for the purposes of mutex(9), IPL_NONE and all the
> IPL_SOFT* are equivalent. So in this case it almost certainly doesn't
> matter in practice, and it's probably no worse than the rest of the
> system to just use sme_mtx with IPL_NONE for now...
I see, though I'm not sure I understand it completely
> although on
> discussion with mrg@, it sounds like there are some pretty deep bugs
> in the way that softints/callouts and blocking work, which can be the
> subject of a separate thread. Argh!
...so I'm looking forward to the new thread :)
>
> So the patch is updated: http://www.netbsd.org/~ozaki-r/callout_halt.diff
>
> I know you didn't introduce this, but while I'm looking at that I
> realize that some of the code you're touching is calling *_destroy
> routines under a lock, which is usually not OK. callout_destroy is an
> exception that happens to be OK, but, e.g., workqueue_destroy may wait
> for work in progress to complete, and kmem_free may call into uvm and
> wait for things inside that.
Okay, I'm trying to get rid of them that require some rearrangements.
Thanks,
ozaki-r
Home |
Main Index |
Thread Index |
Old Index