Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[src/trunk]: src/sys pull across abort fixes from nick-nhusb. add more abort...



details:   https://anonhg.NetBSD.org/src/rev/84a31f85e4e0
branches:  trunk
changeset: 363712:84a31f85e4e0
user:      mrg <mrg%NetBSD.org@localhost>
date:      Thu Aug 09 06:26:47 2018 +0000

description:
pull across abort fixes from nick-nhusb.  add more abort fixes, using
ideas from Taylor and Nick, and myself.  special thanks to both who
inspired much of the code here, if not wrote it directly.

among other problems, this assert should no longer trigger:

   panic: kernel diagnostic assertion "xfer->ux_state == XFER_ONQU" failed: file "/current/src/sys/dev/usb/usbdi.c", line 914

using usbhist i was able to track down my instance of it being related
to userland close() beginning, dropping the sc_lock, and then the usb
softintr completes the transfer normally, and when it is done, the
abort path attempts to re-complete the transfer, and the above assert
is tripped.


changes from nhusb were commited with these logs:
--
Move the struct usb_task to struct usbd_xfer for everyone to use.
--
Set device transfer status to USBD_IN_PROGRESS if start methods succeeds
--
Actually set the transfer status on transfers in ohci_abort_xfer and
the controller is dying
--
Don't supply the lock to callout_halt when polling as it won't be held
--
Improve transfer abort
--
Mark device transfers as USBD_IN_PROGRESS appropriately and improve
abort handling
--
#ifdef DIAGNOSTIC -> KASSERT and add another KASSERT
--
Mark device transfers as USBD_IN_PROGRESS appropriately and improve
abort handling
--

additional changes include:
- initialise the usb abort task in the HCI allocx routine, so that it
  can be safely usb_rem_task()'d.
- rework the handling of softintr vs cancellation vs timeout abort based
  upon a scheme from Taylor:
  when completing a transfer normally:
  - if the status is not in progress, it must be cancelled or timed out,
    and we should not process this xfer.
  - set the status as normal.
  - unconditionallly callout_stop() and usb_rem_task().  they're safe and
    either aren't running, or will run and do nothing.
  - finally call usb_transfer_complete().
  when aborting a transfer:
  - status should be cancelled or timed out.
  - if cancelling, callout_halt and usb_rem_task_wait() to make sure the
    timer is either done or cancelled.
  - at this point, the ux_status must not be cancelled or timed out, and
    if it is not in progress we're done.
  - set the status.
  - if the controller is dying, just return.
  - perform HCI-specific tasks to abort this xfer.
  - finally call usb_transfer_complete().
  for the timeout and timeout task:
  - if the HCI is not dying, and the ux_status is in progress, then
    trigger the usb abort task.
- remove UXFER_ABORTWAIT and UXFER_ABORTING.

tested on:
- multiple PC systems with several types of devices: ugen/UPS, ucom,
  umass with disk, ssd and cdrom backends, kbd, ms, using uhci, ehci
  and xhci.
- erlite3: sd@umass on dwc2.
- sunblade2000: kbd/ms and umass disk on ohci.

untested:
- motg, slhci and ahci.  motg has some portion of the new scheme
  applied, but slhci and ahci require more study.

future work includes pushing a lot of the common abort handling into
usbdi.c and leaving upm_abort() for HC specific tasks, but this change
is pullup-able to netbsd-7 and netbsd-8 as it does not change any
external API, as well as removing over 100 lines of code while adding
over 30 new asserts.

XXX: pullup-7, pullup-8.

diffstat:

 sys/dev/usb/ehci.c           |  263 +++++++++++++++++++++---------------------
 sys/dev/usb/ehcivar.h        |    5 +-
 sys/dev/usb/motg.c           |   51 +++++--
 sys/dev/usb/ohci.c           |  221 +++++++++++++++++++++--------------
 sys/dev/usb/ohcivar.h        |    6 +-
 sys/dev/usb/uhci.c           |  186 ++++++++++++++++--------------
 sys/dev/usb/uhcivar.h        |    6 +-
 sys/dev/usb/usbdi.c          |   12 +-
 sys/dev/usb/usbdivar.h       |    7 +-
 sys/dev/usb/xhci.c           |  159 +++++++++++++++----------
 sys/dev/usb/xhcivar.h        |    4 +-
 sys/external/bsd/dwc2/dwc2.c |  128 ++++++++++++--------
 12 files changed, 586 insertions(+), 462 deletions(-)

diffs (truncated from 1969 to 300 lines):

diff -r ae84e2526a69 -r 84a31f85e4e0 sys/dev/usb/ehci.c
--- a/sys/dev/usb/ehci.c        Wed Aug 08 22:16:49 2018 +0000
+++ b/sys/dev/usb/ehci.c        Thu Aug 09 06:26:47 2018 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: ehci.c,v 1.259 2018/06/06 01:49:09 maya Exp $ */
+/*     $NetBSD: ehci.c,v 1.260 2018/08/09 06:26:47 mrg Exp $ */
 
 /*
  * Copyright (c) 2004-2012 The NetBSD Foundation, Inc.
@@ -53,7 +53,7 @@
  */
 
 #include <sys/cdefs.h>
-__KERNEL_RCSID(0, "$NetBSD: ehci.c,v 1.259 2018/06/06 01:49:09 maya Exp $");
+__KERNEL_RCSID(0, "$NetBSD: ehci.c,v 1.260 2018/08/09 06:26:47 mrg Exp $");
 
 #include "ohci.h"
 #include "uhci.h"
@@ -412,8 +412,7 @@
 
        mutex_init(&sc->sc_lock, MUTEX_DEFAULT, IPL_SOFTUSB);
        mutex_init(&sc->sc_intr_lock, MUTEX_DEFAULT, IPL_USB);
-       cv_init(&sc->sc_softwake_cv, "ehciab");
-       cv_init(&sc->sc_doorbell, "ehcidi");
+       cv_init(&sc->sc_doorbell, "ehcidb");
 
        sc->sc_xferpool = pool_cache_init(sizeof(struct ehci_xfer), 0, 0, 0,
            "ehcixfer", NULL, IPL_USB, NULL, NULL, NULL);
@@ -757,6 +756,7 @@
 ehci_doorbell(void *addr)
 {
        ehci_softc_t *sc = addr;
+       EHCIHIST_FUNC(); EHCIHIST_CALLED();
 
        mutex_enter(&sc->sc_lock);
        cv_broadcast(&sc->sc_doorbell);
@@ -859,11 +859,6 @@
            !TAILQ_EMPTY(&sc->sc_intrhead))
                callout_reset(&sc->sc_tmo_intrlist,
                    hz, ehci_intrlist_timeout, sc);
-
-       if (sc->sc_softwake) {
-               sc->sc_softwake = 0;
-               cv_broadcast(&sc->sc_softwake_cv);
-       }
 }
 
 Static void
@@ -945,7 +940,6 @@
        }
  done:
        DPRINTFN(10, "ex=%#jx done", (uintptr_t)ex, 0, 0, 0);
-       callout_stop(&ex->ex_xfer.ux_callout);
        ehci_idone(ex, cq);
 }
 
@@ -992,7 +986,6 @@
        return;
 done:
        DPRINTF("ex %#jx done", (uintptr_t)ex, 0, 0, 0);
-       callout_stop(&ex->ex_xfer.ux_callout);
        ehci_idone(ex, cq);
 }
 
@@ -1030,7 +1023,6 @@
                return;
 
        DPRINTFN(10, "ex=%#jx done", (uintptr_t)ex, 0, 0, 0);
-       callout_stop(&(ex->ex_xfer.ux_callout));
        ehci_idone(ex, cq);
 }
 
@@ -1038,25 +1030,39 @@
 Static void
 ehci_idone(struct ehci_xfer *ex, ex_completeq_t *cq)
 {
+       EHCIHIST_FUNC(); EHCIHIST_CALLED();
        struct usbd_xfer *xfer = &ex->ex_xfer;
        struct ehci_pipe *epipe = EHCI_XFER2EPIPE(xfer);
        struct ehci_softc *sc = EHCI_XFER2SC(xfer);
        ehci_soft_qtd_t *sqtd, *fsqtd, *lsqtd;
        uint32_t status = 0, nstatus = 0;
        int actlen = 0;
-
-       EHCIHIST_FUNC(); EHCIHIST_CALLED();
-
-       KASSERT(sc->sc_bus.ub_usepolling || mutex_owned(&sc->sc_lock));
+       bool polling = sc->sc_bus.ub_usepolling;
+
+       KASSERT(polling || mutex_owned(&sc->sc_lock));
 
        DPRINTF("ex=%#jx", (uintptr_t)ex, 0, 0, 0);
 
-       if (xfer->ux_status == USBD_CANCELLED ||
-           xfer->ux_status == USBD_TIMEOUT) {
+       /*
+        * If software has completed it, either by cancellation
+        * or timeout, drop it on the floor.
+        */
+       if (xfer->ux_status != USBD_IN_PROGRESS) {
+               KASSERT(xfer->ux_status == USBD_CANCELLED ||
+                   xfer->ux_status == USBD_TIMEOUT);
                DPRINTF("aborted xfer=%#jx", (uintptr_t)xfer, 0, 0, 0);
                return;
        }
 
+       /*
+        * Cancel the timeout and the task, which have not yet
+        * run.  If they have already fired, at worst they are
+        * waiting for the lock.  They will see that the xfer
+        * is no longer in progress and give up.
+        */
+       callout_stop(&xfer->ux_callout);
+       usb_rem_task(xfer->ux_pipe->up_dev, &xfer->ux_aborttask);
+
 #ifdef DIAGNOSTIC
 #ifdef EHCI_DEBUG
        if (ex->ex_isdone) {
@@ -1340,7 +1346,6 @@
                kmem_free(sc->sc_softitds,
                    sc->sc_flsize * sizeof(ehci_soft_itd_t *));
        cv_destroy(&sc->sc_doorbell);
-       cv_destroy(&sc->sc_softwake_cv);
 
 #if 0
        /* XXX destroyed in ehci_pci.c as it controls ehci_intr access */
@@ -1519,6 +1524,10 @@
        xfer = pool_cache_get(sc->sc_xferpool, PR_WAITOK);
        if (xfer != NULL) {
                memset(xfer, 0, sizeof(struct ehci_xfer));
+
+               /* Initialise this always so we can call remove on it. */
+               usb_init_task(&xfer->ux_aborttask, ehci_timeout_task, xfer,
+                   USB_TASKQ_MPSAFE);
 #ifdef DIAGNOSTIC
                struct ehci_xfer *ex = EHCI_XFER2EXFER(xfer);
                ex->ex_isdone = true;
@@ -2172,6 +2181,7 @@
                DPRINTF("dying", 0, 0, 0, 0);
                return;
        }
+
        /* ask for doorbell */
        EOWRITE4(sc, EHCI_USBCMD, EOREAD4(sc, EHCI_USBCMD) | EHCI_CMD_IAAD);
        DPRINTF("cmd = 0x%08jx sts = 0x%08jx",
@@ -3103,20 +3113,24 @@
 }
 
 /*
- * Abort a device request.
- * If this routine is called at splusb() it guarantees that the request
- * will be removed from the hardware scheduling and that the callback
- * for it will be called with USBD_CANCELLED status.
- * It's impossible to guarantee that the requested transfer will not
- * have happened since the hardware runs concurrently.
- * If the transaction has already happened we rely on the ordinary
- * interrupt processing to process it.
- * XXX This is most probably wrong.
- * XXXMRG this doesn't make sense anymore.
+ * Cancel or timeout a device request.  We have two cases to deal with
+ *
+ * 1) A driver wants to stop scheduled or inflight transfers
+ * 2) A transfer has timed out
+ *
+ * have (partially) happened since the hardware runs concurrently.
+ *
+ * Transfer state is protected by the bus lock and we set the transfer status
+ * as soon as either of the above happens (with bus lock held).
+ *
+ * Then we arrange for the hardware to tells us that it is not still
+ * processing the TDs by setting the QH halted bit and wait for the ehci
+ * door bell
  */
 Static void
 ehci_abort_xfer(struct usbd_xfer *xfer, usbd_status status)
 {
+       EHCIHIST_FUNC(); EHCIHIST_CALLED();
        struct ehci_pipe *epipe = EHCI_XFER2EPIPE(xfer);
        struct ehci_xfer *exfer = EHCI_XFER2EXFER(xfer);
        ehci_softc_t *sc = EHCI_XFER2SC(xfer);
@@ -3125,48 +3139,58 @@
        ehci_physaddr_t cur;
        uint32_t qhstatus;
        int hit;
-       int wake;
-
-       EHCIHIST_FUNC(); EHCIHIST_CALLED();
+
+       KASSERTMSG((status == USBD_CANCELLED || status == USBD_TIMEOUT),
+           "invalid status for abort: %d", (int)status);
 
        DPRINTF("xfer=%#jx pipe=%#jx", (uintptr_t)xfer, (uintptr_t)epipe, 0, 0);
 
        KASSERT(mutex_owned(&sc->sc_lock));
        ASSERT_SLEEPABLE();
 
-       if (sc->sc_dying) {
-               /* If we're dying, just do the software part. */
-               xfer->ux_status = status;       /* make software ignore it */
-               callout_stop(&xfer->ux_callout);
-               usb_transfer_complete(xfer);
-               return;
+       if (status == USBD_CANCELLED) {
+               /*
+                * We are synchronously aborting.  Try to stop the
+                * callout and task, but if we can't, wait for them to
+                * complete.
+                */
+               callout_halt(&xfer->ux_callout, &sc->sc_lock);
+               usb_rem_task_wait(xfer->ux_pipe->up_dev, &xfer->ux_aborttask,
+                   USB_TASKQ_HC, &sc->sc_lock);
+       } else {
+               /* Otherwise, we are timing out.  */
+               KASSERT(status == USBD_TIMEOUT);
        }
 
        /*
-        * If an abort is already in progress then just wait for it to
-        * complete and return.
+        * The xfer cannot have been cancelled already.  It is the
+        * responsibility of the caller of usbd_abort_pipe not to try
+        * to abort a pipe multiple times, whether concurrently or
+        * sequentially.
         */
-       if (xfer->ux_hcflags & UXFER_ABORTING) {
-               DPRINTF("already aborting", 0, 0, 0, 0);
-#ifdef DIAGNOSTIC
-               if (status == USBD_TIMEOUT)
-                       printf("ehci_abort_xfer: TIMEOUT while aborting\n");
-#endif
-               /* Override the status which might be USBD_TIMEOUT. */
-               xfer->ux_status = status;
-               DPRINTF("waiting for abort to finish", 0, 0, 0, 0);
-               xfer->ux_hcflags |= UXFER_ABORTWAIT;
-               while (xfer->ux_hcflags & UXFER_ABORTING)
-                       cv_wait(&xfer->ux_hccv, &sc->sc_lock);
+       KASSERT(xfer->ux_status != USBD_CANCELLED);
+
+       /* Only the timeout, which runs only once, can time it out.  */
+       KASSERT(xfer->ux_status != USBD_TIMEOUT);
+
+       /* If anyone else beat us, we're done.  */
+       if (xfer->ux_status != USBD_IN_PROGRESS)
                return;
-       }
-       xfer->ux_hcflags |= UXFER_ABORTING;
+
+       /* We beat everyone else.  Claim the status.  */
+       xfer->ux_status = status;
 
        /*
-        * Step 1: Make interrupt routine and hardware ignore xfer.
+        * If we're dying, skip the hardware action and just notify the
+        * software that we're done.
         */
-       xfer->ux_status = status;       /* make software ignore it */
-       callout_stop(&xfer->ux_callout);
+       if (sc->sc_dying) {
+               goto dying;
+       }
+
+       /*
+        * HC Step 1: Make interrupt routine and hardware ignore xfer.
+        */
        ehci_del_intr_list(sc, exfer);
 
        usb_syncmem(&sqh->dma,
@@ -3202,17 +3226,13 @@
        }
 
        /*
-        * Step 2: Wait until we know hardware has finished any possible
-        * use of the xfer.  Also make sure the soft interrupt routine
-        * has run.
+        * HC Step 2: Wait until we know hardware has finished any possible
+        * use of the xfer.
         */
        ehci_sync_hc(sc);
-       sc->sc_softwake = 1;
-       usb_schedsoftintr(&sc->sc_bus);
-       cv_wait(&sc->sc_softwake_cv, &sc->sc_lock);
 
        /*
-        * Step 3: Remove any vestiges of the xfer from the hardware.
+        * HC Step 3: Remove any vestiges of the xfer from the hardware.
         * The complication here is that the hardware may have executed
         * beyond the xfer we're trying to abort.  So as we're scanning
         * the TDs of this xfer we check if the hardware points to
@@ -3253,17 +3273,14 @@
        }
 
        /*
-        * Step 4: Execute callback.
+        * Final step: Notify completion to waiting xfers.
         */
+dying:
 #ifdef DIAGNOSTIC
        exfer->ex_isdone = true;
 #endif
-       wake = xfer->ux_hcflags & UXFER_ABORTWAIT;



Home | Main Index | Thread Index | Old Index