tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: On softints, softnet_lock and sleeping (aka ipv6 vs USB network interfaces)



   Date: Sun, 6 Dec 2015 09:22:05 +0000
   From: Nick Hudson <skrll%netbsd.org@localhost>

   It seems to me that nd6_timer is either expecting too much of
   the USB stack by expecting a synchronous interface to changing
   multicast filters that doesn't sleep; or the USB stack should
   provide an asynchronous update method and any failure should be
   handled elsewhere.

One quick fix might be to change nd6_timer to call in6_purgeaddr in a
workqueue (or...task, if we had that).  It seems to me that
in6_purgeaddr is a relatively expensive operation (and I think there's
a bug: it calls callout_stop via nd6_dad_stop/nd6_dad_stoptimer when
it should probably call callout_halt), so calling it from a callout
doesn't seem right anyway.

   Another problem in the PR is that

   1) CPU N (not 0) takes softnet_lock and requests a USB control transfer
   (which will sleep for completion)

   2) CPU 0 takes clock interrupt and nd6_timer expires.  nd6_timer starts and
   tries to take softnet lock and blocks

   3) CPU 0 also runs ipintr (not sure why) which takes softnet lock and locks

Aside: This is probably because ipintr gets scheduled on a specified
target CPU, not on the local CPU, in pktq_enqueue...and apparently
every caller, except for bridges, specifies CPU 0.

   4) CPU 0 receives USB HC interrupt for completed control transfer from CPU N
   and schedules softint process (at IPL_SOFTNET) which never runs as the lwp
   is blocked in step 3)

   Maybe

        290 #define IPL_SOFTUSB IPL_SOFTNET

   http://nxr.netbsd.org/xref/src/sys/dev/usb/usbdi.h#290

   should be changed to IPL_SOFTBIO?

As a practical matter, I don't see how that would help -- IPL_SOFTUSB
is only ever used as a mutex ipl, and IPL_SOFT* is equivalent to
IPL_NONE for the practical purposes of mutex(9).

That aside, can softints even interrupt softints, or are the
priorities only about who goes first if two softints are scheduled
`simultaneously' (as far as softint_dispatch can discern)?

If so, even then, using SOFTINT_BIO instead of SOFTINT_NET wouldn't
help here -- SOFTINT_BIO is even lower-priority than SOFTINT_NET, but
you need to allow the USB HC interrupt to run (I assume you mean,
e.g., ehci_softintr?) faster than SOFTINT_NET in order to wake
usbd_transfer while a SOFTINT_NET lwp is blocked on softnet_lock.

It seems to me the deeper problem is that we ever sleeping with
softnet_lock held at all, which (a) is wrong and (b) means it is wrong
to acquire softnet_lock in a softint.


Home | Main Index | Thread Index | Old Index