Date: Fri, 25 Jan 2008 15:23:21 +0000
From: Steven M. Bellovin <smb%cs.columbia.edu@localhost>
To: current-users%netbsd.org@localhost
Subject: interrupt storm after resume on Thinkpad T61
Any thoughts on the interrupt storm problem? Here's a current 'top'
from my machine:
load averages: 0.02, 0.05, 0.01 up 0 days, 11:48 10:17:11
67 processes: 66 sleeping, 1 on CPU
CPU0 states: 0.5% user, 0.0% nice, 0.2% system, 75.3% interrupt, 23.9% idle
CPU1 states: 0.2% user, 0.0% nice, 0.3% system, 0.0% interrupt, 99.5% idle
Memory: 435M Act, 205M Inact, 6976K Wired, 61M Exec, 180M File, 1772M Free
Swap: 4097M Total, 4097M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
20896 smb 85 0 19M 95M select/1 0:23 0.29% 0.29% claws-mail
26181 smb 85 0 404K 6164K select/0 0:00 0.11% 0.10% xterm
1266 smb 85 0 1904K 85M select/1 3:30 0.00% 0.00% Xorg
1149 smb 85 0 16M 214M select/1 2:29 0.00% 0.00% firefox-bin
From 'vmstat -i', I see that it's pin 17:
# vmstat -i
interrupt total rate
softint net/0 141892 3
softint bio/0 212461 5
softint bio block/0 290 0
softint clk/0 2322719 54
cpu0 timer 40316586 949
cpu0 FPU flush IPI 84 0
cpu0 FPU synch IPI 5110 0
cpu0 MTRR update IPI 2 0
cpu0 MSR write IPI 105 0
global TLB IPI 1921145 45
cpu0 TLB IPI 62927 1
softint net/1 30950 0
softint clk/1 30 0
cpu1 timer 40293969 948
cpu1 FPU flush IPI 173 0
cpu1 FPU synch IPI 6665 0
cpu1 MTRR update IPI 21 0
cpu1 MSR write IPI 866 0
cpu1 ACPI CPU sleep IPI 2 0
cpu1 TLB IPI 56061 1
ioapic0 pin 9 286526 6
ioapic0 pin 1 10241 0
ioapic0 pin 12 418841 9
ioapic0 pin 20 47663 1
ioapic0 pin 22 70 0
ioapic0 pin 17 121743653 2865
ioapic0 pin 19 5 0
ioapic0 pin 14 211279 4
Total 208090336 4898
and dmesg shows this:
azalia0: interrupting at ioapic0 pin 17 (irq 11)
wpi0: interrupting at ioapic0 pin 17 (irq 11)
uhci3: interrupting at ioapic0 pin 17 (irq 11)
fwohci0: interrupting at ioapic0 pin 17 (irq 11)
Based on comments at Thinkwiki.org and
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/126369
I suspect a USB interrupt problem. However, I'm running an even newer BIOS
than the one that is claimed to fix the problem, and I'm still seeing it.
I have not tried moving anything off of IRQ 11. Might that help? If
so, what should I try? A different value? Auto?
It would be nice if NetBSD could detect the flakey hardware and disable a
device that generates so many interrupts. That kind of CPU usage will kill
battery lifetime (I'm running estd, and it's at maximum frequency), as
well as heating the machine up and slowing down real applications.
--Steve Bellovin, http://www.cs.columbia.edu/~smb