Subject: kern/21598: lock-up during memory shortage
To: None <gnats-bugs@gnats.netbsd.org>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 05/16/2003 08:53:07
>Number: 21598
>Category: kern
>Synopsis: lock-up during memory shortage
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri May 16 12:54:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Andreas Wrede <andreas@planix.com>
>Release: NetBSD 1.6T and NetBSD 1.6.1
>Organization:
Planix, Inc.
>Environment:
System: NetBSD wonder.wrede.pvt 1.6.1 NetBSD 1.6.1 (PLANIX) #2: Sun May 11 12:00:04 EDT 2003 root@wonder.wrede.pvt:/usr/src/sys/arch/i386/compile/PLANIX i386
Architecture: i386
Machine: i386
>Description:
I am running 1.6.1 on a /i386 machine with 128Mb memory. I get regular
lock-ups during times of high-memory, high-swap activity, typically
during a nightly rsync run against a large filesystem on a remote
machine. I can still break into the debugger, but all higher level
function are frozen. (I don't even get character echo on the serial
console). The tracebacks for the last two lock-ups are:
pmap_extract(c0542400,c0cf1000,cbc98cac,c0241d56) at pmap_extract+0xc
uvm_km_pgremove_intrsafe(c0cdf000,c0d28000,c0cdf000,c0307bf0,c04fc8e0,cf8000,300
00,0,1727,0,cbc98d0c,c04fc908,c04fc908,0,cbc98d30,c0301b5f,c04fc8e0,c0cdf000,c0d
28000,cbc98d2c,c0cf8000,cf8000,30000,e000ffe6,1,3,cbc98d70,cbc98d2c,0,0,cbc98d70
,c0300c87,c04fc8e0,c0cdf000,c0d28000,1727,c052c5e0,49000,c052bce0,c024217a,49000
,c0c16000,cbc98db0,c023bdf1,49000,c0cdf000,cbc98dc0,c023bdf1,c04fc8e0,0,49000,1,
cbcac3d0,49000,1,ffffffff,49000,0,cbc98dd0,ffffffff,49000,0) at uvm_km_pgremove_
intrsafe+0x2a
uvm_unmap_remove(c04fc8e0,c0cdf000,c0d28000,cbc98d2c) at uvm_unmap_remove+0x100
uvm_unmap(c04fc8e0,c0cdf000,c0d28000,1727,c052c5e0) at uvm_unmap+0x8f
uvm_km_kmemalloc(c04fc8e0,0,49000,1,cbcac3d0) at uvm_km_kmemalloc+0x7f
malloc(49000,52,1,0,cbb952f8) at malloc+0x249
amap_copy(cbb952f8,cbcb0780,1,1,f9eb000,f9eb001,cbc98f48,286) at amap_copy+0x16e
uvmfault_amapcopy(cbc98f34,6,0,1,0) at uvmfault_amapcopy+0x128
uvm_fault(cbb952f8,f9eb000,0,2,4811f094) at uvm_fault+0x1b6
trap() at trap+0x4d4
--- trap (number 6) ---
0x481112e4:
and
uvm_pagealloc_strat(0,c7e000,0,0,1,0,0,1727) at uvm_pagealloc_strat+0x141
uvm_km_kmemalloc(c04fc8e0,0,49000,1,cba5fd48) at uvm_km_kmemalloc+0xbd
malloc(49000,52,1,0,cb9c0478) at malloc+0x249
amap_copy(cb9c0478,cba465d0,1,1,1a2bb000,1a2bb001,cbbf2f48,202) at amap_copy+0x1
6e
uvmfault_amapcopy(cbbf2f34,6,0,1,0) at uvmfault_amapcopy+0x128
uvm_fault(cb9c0478,1a2bb000,0,2,4811f094) at uvm_fault+0x1b6
trap() at trap+0x4d4
--- trap (number 6) ---
0x481112e4:
I cought the last one on systat vm and top:
2 users Load 2.68 1.96 1.05 Wed May 14 05:57:51
memory totals (in KB) PAGING SWAPPING Interrupts
real virtual free in out in out 1425 total
Active 67988 374524 1020 ops 1320 3 100 irq0
All 123180 429716 469092 pages 20 irq4
irq6
Proc:r d s w Csw Trp Sys Int Sof Flt forks irq10
1 9 1344 2651 40 1426 1432 1330 fkppw 2 irq11
fksvm 1323 irq12
6.4% Sy 20.8% Us 0.0% Ni 2.0% In 70.9% Id pwait irq15
| | | | | | | | | | | 1320 relck
===>>>>>>>>>>>% 1320 rlkok
noram
Namei Sys-cache Proc-cache ndcpy
Calls hits % hits % fltcp
zfod
cow
Discs cd0 wd0 wd1 fd0 md0 64 fmin
seeks 85 ftarg
xfers 659 664 8596 itarg
Kbyte 2671 2704 436 wired
%busy 45.7 28.7 1308 pdfre
load averages: 2.71, 1.97, 1.06 05:57:55
62 processes: 1 runnable, 60 sleeping, 1 on processor
CPU states: 3.0% user, 0.0% nice, 6.5% system, 2.5% interrupt, 88.1% idle
Memory: 67M Act, 34M Inact, 1744K Wired, 2408K Exec, 1868K File, 300K Free
Swap: 756M Total, 299M Used, 457M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
2012 root -5 0 290M 96M RUN 2:20 25.68% 25.68% rsync
6 root -18 0 0K 14M pgdaemon 0:12 0.44% 0.44% [pagedaemon]
8 root 18 0 0K 14M syncer 0:24 0.00% 0.00% [ioflush]
1009 root 28 0 244K 804K CPU 0:16 0.00% 0.00% top
1008 root 3 0 472K 676K ttyin 0:07 0.00% 0.00% systat
315 andreas 2 0 524K 1308K select 0:06 0.00% 0.00% sshd
243 root 2 0 4908K 1200K select 0:06 0.00% 0.00% squid
238 root 2 0 380K 4K select 0:05 0.00% 0.00% <sshd>
293 andreas 2 0 524K 1308K select 0:03 0.00% 0.00% sshd
219 root 18 -12 716K 1552K pause 0:02 0.00% 0.00% ntpd
121 root 10 0 672K 468K nanoslee 0:01 0.00% 0.00% ipmon
2015 root 2 0 476K 4K select 0:01 0.00% 0.00% <rcmd>
9 root -18 0 0K 14M aiodoned 0:01 0.00% 0.00% [aiodoned]
377 root 18 0 540K 4K pause 0:00 0.00% 0.00% <ksh>
335 root 18 0 540K 4K pause 0:00 0.00% 0.00% <ksh>
316 andreas 18 0 504K 4K pause 0:00 0.00% 0.00% <ksh>
294 andreas 18 0 504K 4K pause 0:00 0.00% 0.00% <ksh>
----
The problem exists in 1.6T as of May 14. The GENERIC kernel locked up
on the second rsync run:
uvm_pagealloc_strat(0,ee0000,0,0,1) at netbsd:uvm_pagealloc_strat+0x153
uvm_km_kmemalloc(c06e6680,0,49000,400000,cb7d1f34) at netbsd:uvm_km_kmemalloc+0x99
malloc(49000,c06b9560,1,0,cb767d84) at netbsd:malloc+0x1d2
amap_copy(cb767d84,cb81d554,1,1,f9db000) at netbsd:amap_copy+0x18d
uvmfault_amapcopy(cb812f34,6,0,1,0) at netbsd:uvmfault_amapcopy+0x128
uvm_fault(cb767d84,f9db000,0,2,4811f094) at netbsd:uvm_fault+0x1b6
trap() at netbsd:trap+0x500
--- trap (number 6) ---
0x481112e4:
db>
NetBSD 1.6.1 (PLANIX) #2: Sun May 11 12:00:04 EDT 2003
root@wonder.wrede.pvt:/usr/src/sys/arch/i386/compile/PLANIX
cpu0: Intel Pentium III (Coppermine) (686-class), 701.68 MHz
cpu0: I-cache 16 KB 32b/line 4-way, D-cache 16 KB 32b/line 2-way
cpu0: L2 cache 256 KB 32b/line 8-way
cpu0: features 387f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR>
cpu0: features 387f9ff<PGE,MCA,CMOV,FGPAT,PSE36,PN,MMX>
cpu0: features 387f9ff<FXSR,SSE>
cpu0: serial number 0000-0683-0003-017B-4660-0B30
total memory = 127 MB
avail memory = 113 MB
using 1658 buffers containing 6632 KB of memory
BIOS32 rev. 0 found at 0xf04e0
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: VIA Technologies VT82C691 (Apollo Pro) Host-PCI (rev. 0x42)
agp0 at pchb0: aperture at 0xe4000000, size 0x10000000
ppb0 at pci0 dev 1 function 0: VIA Technologies VT82C598 (Apollo MVP3) CPU-AGP Bridge (rev. 0x00)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
vga1 at pci1 dev 0 function 0: 3Dfx Interactive Voodoo3 (rev. 0x01)
wsdisplay0 at vga1 kbdmux 1
wsmux1: connecting to wsdisplay0
pcib0 at pci0 dev 7 function 0
pcib0: VIA Technologies VT82C596A (Apollo Pro) PCI-ISA Bridge (rev. 0x12)
pciide0 at pci0 dev 7 function 1: VIA Technologies VT82C596A (Apollo Pro) ATA66 controller
pciide0: bus-master DMA support present
pciide0: primary channel configured to compatibility mode
pciide0: primary channel ignored (disabled)
pciide0: secondary channel configured to compatibility mode
atapibus0 at pciide0 channel 1: 2 targets
cd0 at atapibus0 drive 0: <MATSHITA CR-571, , 1.0d> type 5 cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 3
pciide0: secondary channel interrupting at irq 15
cd0(pciide0:1:0): using PIO mode 3
uhci0 at pci0 dev 7 function 2: VIA Technologies VT83C572 USB Controller (rev. 0x08)
uhci0: interrupting at irq 14
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: VIA Technologie UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
pchb1 at pci0 dev 7 function 3
pchb1: VIA Technologies product 0x3050 (rev. 0x20)
eap0 at pci0 dev 9 function 0: Ensoniq AudioPCI 97 ES1373B (rev. 0x06)
eap0: interrupting at irq 14
eap0: Crystal CS4297 codec; headphone, 18 bit DAC, 18 bit ADC, no 3D stereo
audio0 at eap0: full duplex, mmap, independent
midi0 at eap0: AudioPCI MIDI UART
pciide1 at pci0 dev 10 function 0: Promise Ultra100TX2/ATA Bus Master IDE Accelerator (rev. 0x01)
pciide1: bus-master DMA support present
pciide1: primary channel configured to native-PCI mode
pciide1: using irq 12 for native-PCI interrupt
wd0 at pciide1 channel 0 drive 0: <WDC WD200BB-00AUA1>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 19092 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 39102336 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
pciide1: secondary channel configured to native-PCI mode
wd1 at pciide1 channel 1 drive 0: <IC35L120AVVA07-0>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 115 GB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 241254720 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(pciide1:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
vr0 at pci0 dev 11 function 0: VIA VT3043 (Rhine) 10/100 Ethernet
vr0: interrupting at irq 10
vr0: Ethernet address: 00:50:ba:aa:23:6f
ukphy0 at vr0 phy 8: Generic IEEE 802.3u media interface
ukphy0: Am79C873 10/100 media interface (OUI 0x000676, model 0x0000), rev. 0
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ex0 at pci0 dev 12 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x74)
ex0: interrupting at irq 11
ex0: MAC address 00:50:da:c6:d3:ef
bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
az0 at isa0 port 0x350: Aztech/PackardBell
radio0 at az0
pcppi0 at isa0 port 0x61
midi1 at pcppi0: PC speaker
spkr0 at pcppi0
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
isapnp0: no ISA Plug 'n Play devices found
apm0 at mainbus0: Power Management spec V1.2
APM power mgmt engage (device 1): power management disabled (0x10f)
biomask f3e7 netmask ffe7 ttymask ffe7
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
IP Filter: v3.4.29 initialized. Default = pass all, Logging = enabled
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
>How-To-Repeat:
On a machine with 1.6.1 userland and 1.6T GENERIC kernel, 128Mb memory,
repeatedly run rsync against an large filesystem (62Gb, 600k files)
on a remote machine:
rsync -axH --stats --delete server:/u5 /big1/server/u5
>Fix:
Unknown. Install more memory?
>Release-Note:
>Audit-Trail:
>Unformatted: