tech-net archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: On softints, softnet_lock and sleeping (aka ipv6 vs USB network interfaces)
Date: Sun, 6 Dec 2015 09:22:05 +0000
From: Nick Hudson <skrll%netbsd.org@localhost>
It seems to me that nd6_timer is either expecting too much of
the USB stack by expecting a synchronous interface to changing
multicast filters that doesn't sleep; or the USB stack should
provide an asynchronous update method and any failure should be
handled elsewhere.
One quick fix might be to change nd6_timer to call in6_purgeaddr in a
workqueue (or...task, if we had that). It seems to me that
in6_purgeaddr is a relatively expensive operation (and I think there's
a bug: it calls callout_stop via nd6_dad_stop/nd6_dad_stoptimer when
it should probably call callout_halt), so calling it from a callout
doesn't seem right anyway.
Another problem in the PR is that
1) CPU N (not 0) takes softnet_lock and requests a USB control transfer
(which will sleep for completion)
2) CPU 0 takes clock interrupt and nd6_timer expires. nd6_timer starts and
tries to take softnet lock and blocks
3) CPU 0 also runs ipintr (not sure why) which takes softnet lock and locks
Aside: This is probably because ipintr gets scheduled on a specified
target CPU, not on the local CPU, in pktq_enqueue...and apparently
every caller, except for bridges, specifies CPU 0.
4) CPU 0 receives USB HC interrupt for completed control transfer from CPU N
and schedules softint process (at IPL_SOFTNET) which never runs as the lwp
is blocked in step 3)
Maybe
290 #define IPL_SOFTUSB IPL_SOFTNET
http://nxr.netbsd.org/xref/src/sys/dev/usb/usbdi.h#290
should be changed to IPL_SOFTBIO?
As a practical matter, I don't see how that would help -- IPL_SOFTUSB
is only ever used as a mutex ipl, and IPL_SOFT* is equivalent to
IPL_NONE for the practical purposes of mutex(9).
That aside, can softints even interrupt softints, or are the
priorities only about who goes first if two softints are scheduled
`simultaneously' (as far as softint_dispatch can discern)?
If so, even then, using SOFTINT_BIO instead of SOFTINT_NET wouldn't
help here -- SOFTINT_BIO is even lower-priority than SOFTINT_NET, but
you need to allow the USB HC interrupt to run (I assume you mean,
e.g., ehci_softintr?) faster than SOFTINT_NET in order to wake
usbd_transfer while a SOFTINT_NET lwp is blocked on softnet_lock.
It seems to me the deeper problem is that we ever sleeping with
softnet_lock held at all, which (a) is wrong and (b) means it is wrong
to acquire softnet_lock in a softint.
Home |
Main Index |
Thread Index |
Old Index