Subject: kern/13827: fatal page fault in supervisor mode and hang in 1.5.1
To: None <gnats-bugs@gnats.netbsd.org>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 08/29/2001 22:58:32
>Number: 13827
>Category: kern
>Synopsis: Kernel panics with fatal page fault in supervisor mode and reboot hangs while syncing disks
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Aug 29 19:54:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator: Andreas Wrede
>Release: <NetBSD-current source date>1.5.1-release
>Organization:
Planix, Inc.
>Environment:
System:NetBSD idefix.tvo.org 1.5.1 NetBSD 1.5.1 (TVO) #0: Wed Aug 29 09:25:13 EDT 2001 root@tube.tvo.org:/usr/src/sys/arch/i386/compile/TVO i386
>Description:
After upgrading two (nearly) identical Compaq Proliant 3000 servers
from NetBSD 1.4.1 to 1.5.1, both servers will sometimes panic during
periods of high I/O load, ie. during /etc/daily runs and/or
amanda/dump backups. The kernel is a GENERIC kernel minus some
fs-types and drivers, plus IPsec.
The filesystem were mounted with and without softdep at the time of
the various crashes.
After the panic, during 'syncing disk' the system hangs after
printing 'command timeout' messages for the SCSI devices. Breaking
into the debugger running sync will rest the scsibus and proceed to
dump memory to disk but savecore does not find the core dump in the
swap partition.
------ BOOT -----
>How-To-Repeat:
Build custom kernel(?). Run high I/O load.
>Fix:
unknown
>Release-Note:
>Audit-Trail:
>Unformatted:
>> NetBSD/i386 BIOS Boot, Revision 2.7
>> (he@nsa.uninett.no, Mon Jun 18 01:32:10 CEST 2001)
>> Memory: 639/261120 k
Use hd1a:netbsd to boot sd0 when wd0 is also installed
Press return to boot now, any other key for boot menu
booting wd0a:netbsd - starting in 0
3134862+306440+263396 [65+168880+148937]=0x3d7310
[ preserving 318340 bytes of netbsd ELF symbol table ]
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 1.5.1 (TVO) #0: Wed Aug 29 09:25:13 EDT 2001
root@tube.tvo.org:/usr/src/sys/arch/i386/compile/TVO
cpu0: Intel Pentium III (Katmai) (686-class), 498.72 MHz
total memory = 255 MB
avail memory = 232 MB
using 3297 buffers containing 13188 KB of memory
BIOS32 rev. 0 found at 0xf0000
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled
pchb0 at pci0 dev 0 function 0
pchb0: Intel 82443BX Host Bridge/Controller (AGP disabled) (rev. 0x03)
vga1 at pci0 dev 11 function 0: Cirrus Logic CL-GD5446 (rev. 0x45)
wsdisplay0 at vga1
ppb0 at pci0 dev 13 function 0: Digital Equipment DECchip 21150 PCI-PCI Bridge (rev. 0x04)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
tl0 at pci1 dev 7 function 0
tl0: Compaq ProLiant Integrated Netelligent 10/100 TX
tl0: Ethernet address 00:50:8b:8b:48:e2
tl0: interrupting at irq 5
nsphy0 at tl0 phy 1: DP83840 10/100 media interface, rev. 1
nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
tlphy0 at tl0 phy 31: ThunderLAN 10baseT media interface, rev. 5
tlphy0: 10base2
siop0 at pci1 dev 9 function 0: Symbios Logic 53c875 (ultra-wide scsi)
siop0: using on-board RAM
siop0: interrupting at irq 9
scsibus0 at siop0: 16 targets, 8 luns per target
siop1 at pci1 dev 9 function 1: Symbios Logic 53c875 (ultra-wide scsi)
siop1: using on-board RAM
siop1: interrupting at irq 10
scsibus1 at siop1: 16 targets, 8 luns per target
Compaq product 0xa0f0 (miscellaneous system) at pci0 dev 14 function 0 not configured
fxp0 at pci0 dev 15 function 0: Intel i82557 Ethernet, rev 5
fxp0: interrupting at irq 11
fxp0: Ethernet address 00:50:8b:65:19:22, 10/100 Mb/s
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib0 at pci0 dev 20 function 0
pcib0: Intel 82371AB PCI-to-ISA Bridge (PIIX4) (rev. 0x02)
pciide0 at pci0 dev 20 function 1: Intel 82371AB IDE controller (PIIX4) (rev. 0x01)
pciide0: bus-master DMA support present
pciide0: primary channel wired to compatibility mode
atapibus0 at pciide0 channel 0
cd0 at atapibus0 drive 0: <COMPAQ XM-6402B, , 1723> type 5 cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2
pciide0: primary channel interrupting at irq 14
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfers)
pciide0: secondary channel wired to compatibility mode
pciide0: secondary channel ignored (disabled)
uhci0 at pci0 dev 20 function 2: Intel 82371AB USB Host Controller (PIIX4) (rev. 0x01)
uhci0: can't map i/o space
Intel 82371AB Power Management Controller (PIIX4) (miscellaneous bridge, revision 0x02) at pci0 dev 20 function 3 not configured
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0
pcppi0 at isa0 port 0x61
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
isapnp0: no ISA Plug 'n Play devices found
biomask f7c5 netmask ffe5 ttymask ffe7
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
siop1: target 0 using tagged queuing
sd0 at scsibus1 target 0 lun 0: <COMPAQ, BD009122C6, B016> SCSI2 0/direct fixed
siop1: target 0 using 16bit transfers
siop1: target 0 now synchronous at 20.0Mhz, offset 16
sd0: 8678 MB, 5273 cyl, 20 head, 168 sec, 512 bytes/sect x 17773524 sectors
siop1: target 1 using tagged queuing
sd1 at scsibus1 target 1 lun 0: <COMPAQ, BD00911934, 3B00> SCSI2 0/direct fixed
siop1: target 1 using 16bit transfers
siop1: target 1 now synchronous at 20.0Mhz, offset 15
sd1: 8678 MB, 5273 cyl, 20 head, 168 sec, 512 bytes/sect x 17773524 sectors
siop1: target 2 using tagged queuing
sd2 at scsibus1 target 2 lun 0: <COMPAQ, BD00911934, 3B00> SCSI2 0/direct fixed
siop1: target 2 using 16bit transfers
siop1: target 2 now synchronous at 20.0Mhz, offset 15
sd2: 8678 MB, 5273 cyl, 20 head, 168 sec, 512 bytes/sect x 17773524 sectors
IPsec: Initialized Security Association Processing.
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
swapctl: adding /dev/sd0b as swap device at priority 0
--- CRASH---
fatal page fault in supervisor mode
trap type 6 code 0 eip c0183574 cs 8 eflags 10246 cr2 3c cpl d000f7c4
panic: trap
Begin traceback...
trap() at trap+0x1ed
--- trap (number 6) ---
lockmgr(c04502e4,10012,c0450368) at lockmgr+0x78
uvm_map(c04502e0,d2a96a6c,1000,c0450280,ffffffff) at uvm_map+0x79
uvm_km_valloc(c04502e0,1000,c0440c80,c09fe680,c0a7ff00) at uvm_km_valloc+0x37
_bus_dmamem_map(c0440c80,d2a96ae0,1,1000,c09fe68c) at _bus_dmamem_map+0x2e
siop_morecbd(c095ba00) at siop_morecbd+0xf9
siop_scsicmd(c097e094) at siop_scsicmd+0x52
scsipi_execute_xs(c097e094,0,1009,c0959480,d2a96b98) at scsipi_execute_xs+0x36
scsi_scsipi_cmd(c0959480,d2a96bec,a,ca65a000,2000) at scsi_scsipi_cmd+0xd3
scsipi_command(c0959480,d2a96bec,a,ca65a000,2000) at scsipi_command+0x59
sdstart(c0979400,d000f7c4,c4edc834,d2a96c2c,c02b072b) at sdstart+0x1ea
scsipi_free_xs(c097e094,1) at scsipi_free_xs+0x8b
scsipi_done(c097e094,c095ba00,ff00,1,1009) at scsipi_done+0x123
siop_scsicmd_end(c097f800,c0965c60,d2a874bc,d2a874bc,c095ba00) at siop_scsicmd_end+0x35d
siop_intr(c095ba00) at siop_intr+0x1370
Xintr10() at Xintr10+0x7c
--- interrupt ---
idle(d2a874bc) at idle+0x21
bpendtsleep(c4ebbaa8,11,c035fc43,0,0) at bpendtsleep
getblk(d2a8f680,34b00,2000,0,0) at getblk+0x8c
bread(d2a8f680,34b00,2000,ffffffff,d2a96df8) at bread+0x2d
ffs_update(d2a96e2c,d2a8fea0,d2a96ef4,d2a8fdd0,0) at ffs_update+0x1bc
ffs_full_fsync(d2a96ef4,d2a8fea0,d2a96ef4,d2a8fdd0,1) at ffs_full_fsync+0x224
ffs_fsync(d2a96ef4) at ffs_fsync+0x3a
ffs_sync(c098d200,3,c0959f80,d2a874bc) at ffs_sync+0xf3
sync_fsync(d2a96f68) at sync_fsync+0x53
sched_sync(d2a874bc) at sched_sync+0x119
End traceback...
syncing disks...sd2(siop1:2:0): command timeout
sd2(siop1:2:0): command timeout
sd2(siop1:2:0): command timeout
sd1(siop1:1:0): command timeout
sd1(siop1:1:0): command timeout
sd1(siop1:1:0): command timeout
sd1(siop1:1:0): command timeout
sd0(siop1:0:0): command timeout
sd0(siop1:0:0): command timeout
sd0(siop1:0:0): command timeout
sd0(siop1:0:0): command timeout
[...many hours pass...]
Stopped at cpu_Debugger+0x4: leave
db>
db>
db> trace
cpu_Debugger(c0965920,11,ffffffff,2c,c0a0f160) at cpu_Debugger+0x4
comintr(c095b600) at comintr+0xcd
Xintr4() at Xintr4+0x78
--- interrupt ---
ltsleep(c4ecaae8,11,c035fc9a,0,0) at ltsleep+0x4e
biowait(c4ecaae8,20d5c0,d2ab0db0,c099b000,c040b9a0) at biowait+0x31
bread(d2ab1698,20d5c0,2000,ffffffff,d2a967bc) at bread+0x95
ffs_update(d2a967f0,d2caa924,d2a968b8,d2caa5e4,0) at ffs_update+0x1bc
ffs_full_fsync(d2a968b8,d2caa924,d2a968b8,d2caa5e4,4) at ffs_full_fsync+0x224
ffs_fsync(d2a968b8) at ffs_fsync+0x3a
ffs_sync(c09eae00,2,c0959f80,c0464420,c09eae00) at ffs_sync+0xf3
sys_sync(c0464420,0,0,100,c03749fb) at sys_sync+0x5c
vfs_shutdown(d2a9696c,d2a96960,c0190635,100,0) at vfs_shutdown+0x64
cpu_reboot(100,0,d2a969b0,0,6) at cpu_reboot+0x3b
panic(c03749fb,c04502e4,0,10012,c02a4261) at panic+0xcd
trap() at trap+0x1ed
--- trap (number 6) ---
lockmgr(c04502e4,10012,c0450368) at lockmgr+0x78
uvm_map(c04502e0,d2a96a6c,1000,c0450280,ffffffff) at uvm_map+0x79
uvm_km_valloc(c04502e0,1000,c0440c80,c09fe680,c0a7ff00) at uvm_km_valloc+0x37
_bus_dmamem_map(c0440c80,d2a96ae0,1,1000,c09fe68c) at _bus_dmamem_map+0x2e
siop_morecbd(c095ba00) at siop_morecbd+0xf9
siop_scsicmd(c097e094) at siop_scsicmd+0x52
scsipi_execute_xs(c097e094,0,1009,c0959480,d2a96b98) at scsipi_execute_xs+0x36
scsi_scsipi_cmd(c0959480,d2a96bec,a,ca65a000,2000) at scsi_scsipi_cmd+0xd3
scsipi_command(c0959480,d2a96bec,a,ca65a000,2000) at scsipi_command+0x59
sdstart(c0979400,d000f7c4,c4edc834,d2a96c2c,c02b072b) at sdstart+0x1ea
scsipi_free_xs(c097e094,1) at scsipi_free_xs+0x8b
scsipi_done(c097e094,c095ba00,ff00,1,1009) at scsipi_done+0x123
siop_scsicmd_end(c097f800,c0965c60,d2a874bc,d2a874bc,c095ba00) at siop_scsicmd_e
nd+0x35d
siop_intr(c095ba00) at siop_intr+0x1370
Xintr10() at Xintr10+0x7c
--- interrupt ---
idle(d2a874bc) at idle+0x21
bpendtsleep(c4ebbaa8,11,c035fc43,0,0) at bpendtsleep
getblk(d2a8f680,34b00,2000,0,0) at getblk+0x8c
bread(d2a8f680,34b00,2000,ffffffff,d2a96df8) at bread+0x2d
ffs_update(d2a96e2c,d2a8fea0,d2a96ef4,d2a8fdd0,0) at ffs_update+0x1bc
ffs_full_fsync(d2a96ef4,d2a8fea0,d2a96ef4,d2a8fdd0,1) at ffs_full_fsync+0x224
ffs_fsync(d2a96ef4) at ffs_fsync+0x3a
ffs_sync(c098d200,3,c0959f80,d2a874bc) at ffs_sync+0xf3
sync_fsync(d2a96f68) at sync_fsync+0x53
sched_sync(d2a874bc) at sched_sync+0x119
db> sync
dumping to dev 4,1 offset 500487
dump siop1: scsi bus reset
cmd 0xc097fa80 (target 0:0) in reset list
cmd 0xc097f840 (target 0:0) in reset list
cmd 0xc097f980 (target 0:0) in reset list
cmd 0xc097f900 (target 0:0) in reset list
cmd 0xc0aca000 (target 0:0) in reset list
cmd 0xc097f8c0 (target 1:0) in reset list
cmd 0xc097f9c0 (target 1:0) in reset list
cmd 0xc097fa00 (target 1:0) in reset list
cmd 0xc097fac0 (target 1:0) in reset list
cmd 0xc097f940 (target 2:0) in reset list
cmd 0xc097f880 (target 2:0) in reset list
cmd 0xc097fa40 (target 2:0) in reset list
cmd 0xc097f800 (target 2:0) in reset list
cmd 0xc097fa80 (status 2) about to be processed
cmd 0xc097f840 (status 2) about to be processed
cmd 0xc097f980 (status 2) about to be processed
cmd 0xc097f900 (status 2) about to be processed
cmd 0xc0aca000 (status 2) about to be processed
cmd 0xc097f8c0 (status 2) about to be processed
cmd 0xc097f9c0 (status 2) about to be processed
cmd 0xc097fa00 (status 2) about to be processed
cmd 0xc097fac0 (status 2) about to be processed
cmd 0xc097f940 (status 2) about to be processed
cmd 0xc097f880 (status 2) about to be processed
cmd 0xc097fa40 (status 2) about to be processed
cmd 0xc097f800 (status 0) about to be processed
siop1: target 0 using 16bit transfers
siop1: target 0 now synchronous at 20.0Mhz, offset 16
siop1: target 1 using 16bit transfers
siop1: target 1 now synchronous at 20.0Mhz, offset 15
siop1: target 2 using 16bit transfers
siop1: target 2 now synchronous at 20.0Mhz, offset 15
255 254 .................
rebooting
-------