Subject: Re: bge/ahd nterrupt problems
To: Frank van der Linden <fvdl@netbsd.org>
From: Edgar =?iso-8859-1?B?RnXf?= <ef@math.uni-bonn.de>
List: port-amd64
Date: 03/25/2007 16:04:38
> Let me know what you see.
OK, here is what I see with DDB when the server is in the "locked up" state:
Lots of
ahd1: Timedout SCB already complete. Interrupts may not be functioning.
cpuvar:
pending: 40000000
level: D
depth: 1
ioapics RDRs (i.e. write 2*i+10 to REG, read DATA):
ioapic1 (the one ahd1 is on):
0: E063
1: E064 (i.e., the receipt bit is on)
2, 3 disabled (10000)
ioapic2:
0,1,2,3: disabled
ioapic0:
00: 0700
01: 0090
03: 00D1
04: 00D0
09: A0A0
0E: 0061
0F: 0062
12: A070
13: A060
rest disabled.
Setting a breakpoint on ahd_intr:
Looks like getting only interrupts for ahd0 and none for ahd1.
I can also inspect ci_isources, but that doesn't make sense as long as
either I misunderstand which one should be handling the interrupt or
there is indeed confusion wrt. multiple IOAPICs.
I still have the machine more or less untouched (i.e. it still complains
about ahd1 timeouts). But I will now leave the server cellar in favour
or a bicycle ride. I can return later today if someone wants me to inspect
further hardware registers. Otherwise, I'll try to save the RAID parity and
try to get a dump. Then, I'll probably run the torture test with a non-IOAPIC
kernel.
Thanks for any hints what's going on.
I would really like this solved next week.