NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/57644: ixl(4) may panic or hang
The following reply was made to PR kern/57644; it has been noted by GNATS.
From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Manuel.Bouyer%lip6.fr@localhost
Cc: gnats-bugs%NetBSD.org@localhost, knakahara%NetBSD.org@localhost, yamaguchi%NetBSD.org@localhost,
rin%NetBSD.org@localhost
Subject: Re: kern/57644: ixl(4) may panic or hang
Date: Fri, 6 Oct 2023 09:43:27 +0000
bouyer: Can you try a current kernel and reproduce the hang? This
should trigger a heartbeat panic in current, which should give us more
diagnostic information.
knakahara, yamaguchi, rin: This can't be right:
ixl_atq_set(&sc->sc_link_state_atq, ixl_get_link_status_done);
The function passed to ixl_atq_set is only ever called from interrupt
context via ixl_intr/ixl_other_intr -> ixl_atq_done ->
ixl_atq_done_locked.
ixl_get_link_status_done is wrong here for two reasons:
1. It releases and reacquires sc->sc_atq_lock, but ixl_atq_done_locked
does not appear to be prepared for iatq_fn to release and reacquire
the lock -- it may silently use stale cached values of
sc->sc_atq_prod/cons.
2. It calls ixl_link_state_update which takes sc->sc_cfg_lock which is
an adaptive lock, forbidden in interrupt context -- that's what led
to this crash.
ixl_atq_exec_locked is also wrong because it doesn't wait for any
particular condition with cv_timedwait; it may spuriously wake up
early, or it may report timeout even though the interrupt was
delivered. (cv_timedwait is a bad API, very hard to use correctly.)
If ixl_get_link_status_done has to be done asynchronously, perhaps we
can resolve both (1) and (2) by creating a new lock, say sc_mii_lock,
with the following lock order:
sc_cfg_lock => sc_atq_lock => sc_mii_lock
This sc_mii_lock would be used for all the mii/ifmedia logic, such as
ifmedia_init_with_lock. ixl_get_link_status_done would no longer need
to release and reacquire sc_atq_lock, and ixl_link_status_update would
no longer need to take sc_cfg_lock.
It doesn't resolve the issue with ixl_wakeup and cv_timedwait, though.
Perhaps a single bit of state inside the iatq entry or something would
be enough -- something like this:
/* ixl_wakeup */
iatq->iatq_done = 1;
cv_broadcast(&sc->sc_atq_cv);
/* ixl_atq_exec_locked */
const unsigned start = getticks();
iatq->iatq_done = 0;
ixl_atq_set(iatq, ixl_wakeup);
error = ixl_atq_post_locked(sc, iatq);
if (error)
return error;
while (!iatq->iatq_done) {
const unsigned slept = getticks() - start;
if (slept >= IXL_ATQ_EXEC_TIMEOUT)
return ETIMEDOUT;
(void)cv_timedwait(&sc->sc_atq_cv, &sc->sc_atq_lock,
IXL_ATQ_EXEC_TIMEOUT - slept);
}
return 0;
Home |
Main Index |
Thread Index |
Old Index