Subject: Re: Getting "TLB IPI rendezvous failed..."
To: Frank van der Linden <fvdl@netbsd.org>
From: Stephan Uphoff <ups@tree.com>
List: tech-kern
Date: 01/11/2005 15:37:56
On Thu, 2004-12-23 at 07:50, Frank van der Linden wrote:
> On Thu, Dec 23, 2004 at 12:56:26AM -0600, Frederick Bruckman wrote:
> > 2) The general pattern seems to be that one cpu is at spipl(), waiting
> > for a lock, while the other cpu insists on doing something to the first
> > cpu, and has no way to back off? I wonder why it's only i386.
>
> That's the general deadlock pattern: one CPU is at a very high spl
> (splipi, which is the highest possible), waiting to acquire a lock. Another
> CPU holds the lock, and has to do something which involves sending an IPI
> and waiting for the other CPUs to receive it. But, the first CPU never
> gets it.
>
> I don't know why this problem has resurfaced recently for some people.
>
> Manuel is right, collecting the traces is the most important thing, it
> will show where the CPUs get stuck.
>
> - Frank
>
I think that sending IPIs needs to be protected with splclock() since
some interrupts may send IPIs.
x86_broadcast_ipi should also call x86_ipi to benefit from
i82489_icr_wait.
I should be able to test a fix over the next weekend.
Stephan