Subject: kern/32162: [netbsd-3.0] kernel dead-lock in MP system
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 11/25/2005 03:13:00
>Number: 32162
>Category: kern
>Synopsis: kernel dead-lock in MP system
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Nov 25 03:13:00 +0000 2005
>Originator: Andreas Wrede
>Release: NetBSD 3.0_RC2
>Organization:
Planix, Inc.
>Environment:
System: NetBSD whome.planix.com 3.0_RC3 NetBSD 3.0_RC3 (PLANIX.MPACPI) #0: Thu Nov 24 20:57:09 EST 2005 root@whome.planix.com:/u1/netbsd-3.0/src/sys/arch/i386/compile/obj.i386/PLANIX.MPACPI i386
Architecture: i386
Machine: i386
>Description:
Over the last week I have experienced 3 kernel dead-locks on a NetBSD 3.0_RC1/2/3 system.
The motherboard is a Tylan K8S Pro S2882G3NR with 2 AMD Opteron 244 CPUs installed. The kernel
is differs from GENERIC.MPACPI in the value for some SYSVSEM variables, maxusers and some
other variables.
Backtrace on both CPUs from the last dead-lock:
Thu Nov 24 19:52:28 2005
Stopped in pid 11938.1 (ps) at netbsd:cpu_Debugger+0x4: leave
db{0}> bt
cpu_Debugger(c07e6980,db133804,cd11dda4,202,0) at netbsd:cpu_Debugger+0x4
comintr(c1cb4800,8,10,c1a80030,10) at netbsd:comintr+0x672
Xintr_ioapic_edge4() at netbsd:Xintr_ioapic_edge4+0x9c
--- interrupt ---
_kernel_lock(42,10,c1a80030,10,10) at netbsd:_kernel_lock+0x81
x86_softintlock(c0839854,c07e6980,7,cd11de90,c039b7ed) at netbsd:x86_softintlock+0xd
DDB lost frame for netbsd:Xsoftserial+0x18, trying 0xcd11de54
Xsoftserial() at netbsd:Xsoftserial+0x18
--- interrupt ---
0xcd11deb0:
db{0}> machine cpu 1
using CPU 1
db{0}> bt
lockmgr(cfe6739c,1,0,0,cb7ad738) at netbsd:lockmgr+0x6a
uvmfault_lookup(cd144ac0,0,0,cd144af4,c039ad91) at netbsd:uvmfault_lookup+0x1aa
uvm_fault(cfe67398,8062000,0,2,2) at netbsd:uvm_fault+0x62
trap() at netbsd:trap+0x33c
--- trap (number 6) ---
i486_copyout(cc04b000,52c,cd144d14,c039ad91,1a000) at netbsd:i486_copyout+0x3d
ffs_read(cd144cb4,cb7b634c,10001,0,c0634560) at netbsd:ffs_read+0x422
VOP_READ(cb7b634c,cd144d14,1,cb7a063c,0) at netbsd:VOP_READ+0x34
vn_rdwr(0,cb7b634c,8062000,52c,1a000) at netbsd:vn_rdwr+0xb4
vmcmd_readvn(db133ccc,c230611c,bfc00000,0,0) at netbsd:vmcmd_readvn+0x2f
sys_execve(cd0349e4,cd144f64,cd144f5c,3b,c1ac9800) at netbsd:sys_execve+0x772
syscall_plain() at netbsd:syscall_plain+0x17e
--- syscall (number 59) ---
0xbdb2b13f:
db{0}> machine cpu
addr dev id flags ipis curproc fpcurproc
0xc07e6980 cpu0 0 3009 0 0xcef118e4 0xcef118e4
0xc1ac9800 cpu1 1 f002 0 0xcd0349e4 0xce5cac70
dmesg output:
NetBSD 3.0_RC3 (PLANIX.MPACPI) #0: Thu Nov 24 20:57:09 EST 2005
root@whome.planix.com:/u1/netbsd-3.0/src/sys/arch/i386/compile/obj.i386/PLANIX.MPACPI
total memory = 1023 MB
avail memory = 982 MB
BIOS32 rev. 0 found at 0xf0010
mainbus0 (root)
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Unknown K7 (Athlon) (686-class), 1793.12 MHz, id 0xf5a
cpu0: features 78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 78bfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,MMX>
cpu0: features 78bfbff<FXSR,SSE,SSE2>
cpu0: "AMD Opteron(tm) Processor 244"
cpu0: calibrating local timer
cpu0: apic clock running at 199 MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: starting
cpu1: AMD Unknown K7 (Athlon) (686-class), 1792.97 MHz, id 0xf5a
cpu1: features 78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features 78bfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,MMX>
cpu1: features 78bfbff<FXSR,SSE,SSE2>
cpu1: "AMD Opteron(tm) Processor 244"
ioapic0 at mainbus0 apid 2 (I/O APIC)
ioapic0: pa 0xfec00000, version 11, 24 pins
ioapic1 at mainbus0 apid 3 (I/O APIC)
ioapic1: pa 0xfebff000, version 11, 4 pins
ioapic2 at mainbus0 apid 4 (I/O APIC)
ioapic2: pa 0xfebfe000, version 11, 4 pins
acpi0 at mainbus0
acpi0: using Intel ACPI CA subsystem version 20040211
acpi0: X/RSDT: OemId <A M I ,OEMRSDT ,02000511>, AslId <MSFT,00000097>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
ACPI Object Type 'Processor' (0x0c) at acpi0 not configured
PNP0A03 [PCI Bus] at acpi0 not configured
PNP0000 [AT Interrupt Controller] at acpi0 not configured
PNP0200 [AT DMA Controller] at acpi0 not configured
PNP0100 [AT Timer] at acpi0 not configured
PNP0B00 [AT Real-Time Clock] at acpi0 not configured
PNP0800 [AT-style speaker sound] at acpi0 not configured
npx0 at acpi0 (PNP0C04)
npx0: io 0xf0-0xff irq 13
npx0: using exception 16
com0 at acpi0 (PNP0501-1)
com0: io 0x3f8-0x3ff irq 4
com0: ns16550a, working fifo
com0: console
com1 at acpi0 (PNP0501-2)
com1: io 0x2f8-0x2ff irq 3
com1: ns16550a, working fifo
fdc0 at acpi0 (PNP0700)
fdc0: io 0x3f0-0x3f5,0x3f7 irq 6 drq 2
fdc0: expected BUFFER, got 4
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0103 at acpi0 not configured
PNP0C02 [Plug and Play motherboard register resources] at acpi0 not configured
PNP0C01 [System Board] at acpi0 not configured
acpibut0 at acpi0 (PNP0C0C-170): ACPI Power Button
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
PNP0C0F [PCI interrupt link device] at acpi0 not configured
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
ppb0 at pci0 dev 6 function 0: Advanced Micro Devices AMD8111 I/O Hub (rev. 0x07)
pci1 at ppb0 bus 3
pci1: i/o space, memory space enabled
ohci0 at pci1 dev 0 function 0: Advanced Micro Devices AMD8111 USB Host Controller (rev. 0x0b)
ohci0: interrupting at ioapic0 pin 19 (irq 9)
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: Advanced Micro OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1 at pci1 dev 0 function 1: Advanced Micro Devices AMD8111 USB Host Controller (rev. 0x0b)
ohci1: interrupting at ioapic0 pin 19 (irq 9)
ohci1: OHCI version 1.0, legacy support
usb1 at ohci1: USB revision 1.0
uhub1 at usb1
uhub1: Advanced Micro OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
satalink0 at pci1 dev 5 function 0
satalink0: Silicon Image SATALink 3114 (rev. 0x02)
satalink0: 33MHz PCI bus
satalink0: bus-master DMA support present
satalink0: using ioapic0 pin 17 (irq 10) for native-PCI interrupt
atabus0 at satalink0 channel 0
atabus1 at satalink0 channel 1
atabus2 at satalink0 channel 2
atabus3 at satalink0 channel 3
vga0 at pci1 dev 6 function 0: ATI Technologies Rage XL (rev. 0x27)
wsdisplay0 at vga0 kbdmux 1
wsmux1: connecting to wsdisplay0
pcib0 at pci0 dev 7 function 0
pcib0: Advanced Micro Devices AMD8111 LPC Controller (rev. 0x05)
viaide0 at pci0 dev 7 function 1
viaide0: Advanced Micro Devices AMD8111 IDE Controller (rev. 0x03)
viaide0: bus-master DMA support present
viaide0: primary channel configured to compatibility mode
viaide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus4 at viaide0 channel 0
viaide0: secondary channel configured to compatibility mode
viaide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus5 at viaide0 channel 1
Advanced Micro Devices AMD8111 SMBus Controller (SMBus serial bus, revision 0x02) at pci0 dev 7 function 2 not configured
Advanced Micro Devices AMD8111 ACPI Controller (miscellaneous bridge, revision 0x05) at pci0 dev 7 function 3 not configured
ppb1 at pci0 dev 10 function 0: Advanced Micro Devices AMD8131 PCI-X Tunnel (rev. 0x12)
pci2 at ppb1 bus 2
pci2: memory space enabled
bge0 at pci2 dev 9 function 0: Broadcom BCM5704C Dual Gigabit Ethernet
bge0: interrupting at ioapic1 pin 0 (irq 5)
bge0: ASIC BCM5704 A3 (0x2003), Ethernet address 00:e0:81:2e:52:8c
brgphy0 at bge0 phy 1: BCM5704 1000BASE-T media interface, rev. 0
brgphy0: using BCM5704 DSP patch
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge1 at pci2 dev 9 function 1: Broadcom BCM5704C Dual Gigabit Ethernet
bge1: interrupting at ioapic1 pin 1 (irq 10)
bge1: ASIC BCM5704 A3 (0x2003), Ethernet address 00:e0:81:2e:52:8d
brgphy1 at bge1 phy 1: BCM5704 1000BASE-T media interface, rev. 0
brgphy1: using BCM5704 DSP patch
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
Advanced Micro Devices AMD8131 IO Apic (interrupt system, interface 0x10, revision 0x01) at pci0 dev 10 function 1 not configured
ppb2 at pci0 dev 11 function 0: Advanced Micro Devices AMD8131 PCI-X Tunnel (rev. 0x12)
pci3 at ppb2 bus 1
pci3: i/o space, memory space enabled
isp0 at pci3 dev 3 function 0: QLogic FC-AL and Fabric HBA
isp0: interrupting at ioapic2 pin 0 (irq 5)
scsibus0 at isp0: 256 targets, 8 luns per target
Advanced Micro Devices AMD8131 IO Apic (interrupt system, interface 0x10, revision 0x01) at pci0 dev 11 function 1 not configured
pchb0 at pci0 dev 24 function 0
pchb0: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb1 at pci0 dev 24 function 1
pchb1: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb2 at pci0 dev 24 function 2
pchb2: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb3 at pci0 dev 24 function 3
pchb3: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
pchb4 at pci0 dev 25 function 0
pchb4: Advanced Micro Devices AMD64 HyperTransport configuration (rev. 0x00)
pchb5 at pci0 dev 25 function 1
pchb5: Advanced Micro Devices AMD64 Address Map configuration (rev. 0x00)
pchb6 at pci0 dev 25 function 2
pchb6: Advanced Micro Devices AMD64 DRAM configuration (rev. 0x00)
pchb7 at pci0 dev 25 function 3
pchb7: Advanced Micro Devices AMD64 Miscellaneous configuration (rev. 0x00)
isa0 at pcib0
lm0 at isa0 port 0x290-0x297: W83627HF
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
isapnp0: no ISA Plug 'n Play devices found
ioapic0: enabling
ioapic1: enabling
ioapic2: enabling
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
scsibus0: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <APPLE, Xserve RAID, 1.26> disk fixed
sd0: 1035 GB, 132522 cyl, 128 head, 128 sec, 512 bytes/sect x 2171240448 sectors
wd0 at atabus4 drive 0: <ST380011A>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 76319 MB, 155061 cyl, 16 head, 63 sec, 512 bytes/sect x 156301488 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA)
atapibus0 at atabus5: 2 targets
cd0 at atapibus0 drive 0: <HL-DT-STDVD-ROM GDR8163B, , 0L23> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
wd1 at atabus5 drive 1: <Maxtor 6Y160P0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 152 GB, 317632 cyl, 16 head, 63 sec, 512 bytes/sect x 320173056 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
cd0(viaide0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) (using DMA)
wd1(viaide0:1:1): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
cpu1: CPU 1 running
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
>How-To-Repeat:
Run 3.0_RCx on a dual CPU machine with a MPACPI kernel. Wait.
>Fix:
>Unformatted: