NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-alpha/38335 (kernel freeze on alpha MP system)
The following reply was made to PR port-alpha/38335; it has been noted by GNATS.
From: "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost>
To: Jarle Greipsland <jarle%uninett.no@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, gnats-admin%netbsd.org@localhost
Subject: Re: port-alpha/38335 (kernel freeze on alpha MP system)
Date: Sat, 3 Oct 2009 10:25:43 -0600 (MDT)
On Fri, 2 Oct 2009, Jarle Greipsland wrote:
> "Michael L. Hitch" <mhitch%lightning.msu.montana.edu@localhost> writes:
>> It does show that what I thought had happened did indeed happen.
>>> 0xfffffc0000736ac0 <pmap_do_tlb_shootdown+224>: ldq t1,24(t3)
> [ ... ]
> Is there any more info I can gather for you, or can you just as
> easily reproduce this yourself?
I'm quite certain in this particular case the job queue entry is linked
to itself resulting in a hung cpu (which then hangs other cpus because it
has the shootdown queue locked for that cpu).
I haven't been able to easily reproduce this. I can complete full
builds fairly often, although a lot of the time I will get a segment fault
in one of the tools (typically grotty or install).
I also got another deadlock situation yesterday: cpu 0 had acquired the
lock for the shootdown queue (presumably for a different cpu - it should
be skipping the current cpu) and got interrupted by a shootdown IPI. The
IPI routine was trying to acquire the lock for the current cpu's queue,
which was currently locked (can't tell what held the lock though).
One thing I haven't tried yet is a LOCKDEBUG kernel, which should do
some additional checking on locking, and should be able to provide
information on what holds the lock. This last time I tried LOCKDEBUG, I
ran into problems and quickly got lost in the locking morass.
>> This is the problem I'm still in the process of trying to figure out
>> what the problem is and how to fix it. The patch I posted previously is a
>> workaround to detect this particular problem, and will display a message
>> if it occurs.
> It _does_ occur. I applied the patch, and a 'build.sh -j4'
> triggered the panic after a while.
What was the panic you got?
--
Michael L. Hitch mhitch%montana.edu@localhost
Computer Consultant
Information Technology Center
Montana State University Bozeman, MT USA
Home |
Main Index |
Thread Index |
Old Index