Date: Fri, 1 Jun 2018 12:47:32 +0900
From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
To: 6bone%6bone.informatik.uni-leipzig.de@localhost
Cc: msaitoh%execsw.org@localhost, Martin Husemann <martin%duskware.de@localhost>,
current-users%netbsd.org@localhost
Subject: Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg
drivers (?))
The same diff is at:
http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif
Updated patch (Fix compile error and ixv patch):
------------------------------------------
Don't call ixgbe_rearm_queues() in ixgbe_local_timer1(). ixgbe_enable_queue()
and ixgbe_disable_queue() try to enable/disable queue interrupt safely. It
has the internal counter. When a queue's MSI-X is received, ixgbe_msix_que()
is called (IPL_NET). This function disable the queue's interrupt by
ixgbe_disable_queue() and issues an softint. ixgbe_handle() queue is called by
the softint (IPL_SOFTNET), process TX,RX and call ixgbe_enable_queue() at the
end.
ixgbe_local_timer1() is a callout and run always on CPU 0 (IPL_SOFTCLOCK).
When ixgbe_rearm_queues() called, an MSI-X interrupt is issued for a specific
queue. It may not CPU 0. If this interrupt's ixgbe_msix_que() is called
and sofint_schedule() is called before the last sofint's softint_execute()
is not called, the softint_schedule() fails because of SOFTINT_PENDING.
It result in breaking ixgbe_{enable,disable}_queue()'s internal counter.
ixgbe_local_timer1() is written not to call ixgbe_rearm_queues() if
the interrupt is disabled, but it's called because of unknown bug or a race.
One solution is to not to use the internal counter, but it's little difficult.
Another solution is stop using ixgbe_rearm_queues() at all. Essentially,
ixgbe_rearm_queues() is not required (it was added in ixgbe.c rev. 1.43
(2016/12/01)). ixgbe_rearm_queues() helps for lost interrupt problem but
I've never seen it other than ixgbe_rearm_queues() problem.
Index: ixgbe.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.c,v
retrieving revision 1.158
diff -u -p -r1.158 ixgbe.c
--- ixgbe.c 30 May 2018 09:17:17 -0000 1.158
+++ ixgbe.c 1 Jun 2018 03:22:05 -0000
@@ -4411,6 +4411,7 @@ ixgbe_local_timer1(void *arg)
/* Only truely watchdog if all queues show hung */
if (hung == adapter->num_queues)
goto watchdog;
+#if 0 /* XXX Avoid unexpectedly disabling interrupt forever (PR#53294) */
else if (queues != 0) { /* Force an IRQ on queues with work */
que = adapter->queues;
for (i = 0; i < adapter->num_queues; i++, que++) {
@@ -4421,6 +4422,7 @@ ixgbe_local_timer1(void *arg)
mutex_exit(&que->dc_mtx);
}
}
+#endif
out:
callout_reset(&adapter->timer, hz, ixgbe_local_timer, adapter);
@@ -6643,7 +6645,7 @@ ixgbe_handle_link(void *context)
/************************************************************************
* ixgbe_rearm_queues
************************************************************************/
-static void
+static __inline void
ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
{
u32 mask;
Index: ixv.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixv.c,v
retrieving revision 1.102
diff -u -p -r1.102 ixv.c
--- ixv.c 30 May 2018 08:35:26 -0000 1.102
+++ ixv.c 1 Jun 2018 03:22:05 -0000
@@ -1266,9 +1266,11 @@ ixv_local_timer_locked(void *arg)
/* Only truly watchdog if all queues show hung */
if (hung == adapter->num_queues)
goto watchdog;
+#if 0
else if (queues != 0) { /* Force an IRQ on queues with work */
ixv_rearm_queues(adapter, queues);
}
+#endif
callout_reset(&adapter->timer, hz, ixv_local_timer, adapter);
------------------------------------------
The same diff is at:
http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180531-0.dif
--
-----------------------------------------------
SAITOH Masanobu (msaitoh%execsw.org@localhost
msaitoh%netbsd.org@localhost)