Subject: Re: bge/ahd interrupt problems: partly resolved (hardware bug)
To: Frank van der Linden <fvdl@netbsd.org>
From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
List: port-amd64
Date: 03/25/2007 23:36:09
> Let me know what you see.
OK, it may be a HARDWARE bug.
What I saw was that the "receipt" bit was set for pin1 in ioapi1,
but Xintr_ioapic_level10 was not being called. Looked like a missing EOI.
All the vectors and idt entries looked reasonable.
Googling revealed that linux had a strange workaround
(arch/i386/kernel/io_apic.c):
/*
* It appears there is an erratum which affects at least version 0x11
* of I/O APIC (that's the 82093AA and cores integrated into various
* chipsets). Under certain conditions a level-triggered interrupt is
* erroneously delivered as edge-triggered one but the respective IRR
* bit gets set nevertheless. As a result the I/O unit expects an EOI
* message but it will never arrive and further interrupts are blocked
* from the source. The exact reason is so far unknown, but the
* phenomenon was observed when two consecutive interrupt requests
* from a given source get delivered to the same CPU and the source is
* temporarily disabled in between.
*
* A workaround is to simulate an EOI message manually. We achieve it
* by setting the trigger mode to edge and then to level when the edge
* trigger mode gets detected in the TMR of a local APIC for a
* level-triggered interrupt. We mask the source for the time of the
* operation to prevent an edge-triggered interrupt escaping meanwhile.
* The idea is from Manfred Spraul. --macro
*/
So (still with a breakpoint on Xint_ioapic_level10) I set pin 1's mode
to edge, and voila, I hit the breakpoint. I reset the mode to level,
continued, and kept hitting the breakpoint. I deleted it and, believe
it or not, the machine happily runs again.
I have no idea whether the analysis in the linux comment is correct,
but the workaround succeeded.
Anyone in a position to write a similar workaround for NetBSD?
I will happily test.