Subject: kern/21598: lock-up during memory shortage
To: None <gnats-bugs@gnats.netbsd.org>
From: Andreas Wrede <andreas@planix.com>
List: netbsd-bugs
Date: 05/16/2003 08:53:07
>Number:         21598
>Category:       kern
>Synopsis:       lock-up during memory shortage
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 16 12:54:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Andreas Wrede <andreas@planix.com>
>Release:        NetBSD 1.6T and NetBSD 1.6.1
>Organization:
Planix, Inc.
>Environment:
	
	
System: NetBSD wonder.wrede.pvt 1.6.1 NetBSD 1.6.1 (PLANIX) #2: Sun May 11 12:00:04 EDT 2003   root@wonder.wrede.pvt:/usr/src/sys/arch/i386/compile/PLANIX i386
Architecture: i386
Machine: i386
>Description:
I am running 1.6.1 on a /i386 machine with 128Mb memory. I get regular
lock-ups during times of high-memory, high-swap activity, typically
during a nightly rsync run against a large filesystem on a remote
machine.  I can still break into the debugger, but all higher level
function are frozen. (I don't even get character echo on the serial
console). The tracebacks for the last two lock-ups are:

pmap_extract(c0542400,c0cf1000,cbc98cac,c0241d56) at pmap_extract+0xc
uvm_km_pgremove_intrsafe(c0cdf000,c0d28000,c0cdf000,c0307bf0,c04fc8e0,cf8000,300
00,0,1727,0,cbc98d0c,c04fc908,c04fc908,0,cbc98d30,c0301b5f,c04fc8e0,c0cdf000,c0d
28000,cbc98d2c,c0cf8000,cf8000,30000,e000ffe6,1,3,cbc98d70,cbc98d2c,0,0,cbc98d70
,c0300c87,c04fc8e0,c0cdf000,c0d28000,1727,c052c5e0,49000,c052bce0,c024217a,49000
,c0c16000,cbc98db0,c023bdf1,49000,c0cdf000,cbc98dc0,c023bdf1,c04fc8e0,0,49000,1,
cbcac3d0,49000,1,ffffffff,49000,0,cbc98dd0,ffffffff,49000,0) at uvm_km_pgremove_
intrsafe+0x2a
uvm_unmap_remove(c04fc8e0,c0cdf000,c0d28000,cbc98d2c) at uvm_unmap_remove+0x100
uvm_unmap(c04fc8e0,c0cdf000,c0d28000,1727,c052c5e0) at uvm_unmap+0x8f
uvm_km_kmemalloc(c04fc8e0,0,49000,1,cbcac3d0) at uvm_km_kmemalloc+0x7f
malloc(49000,52,1,0,cbb952f8) at malloc+0x249
amap_copy(cbb952f8,cbcb0780,1,1,f9eb000,f9eb001,cbc98f48,286) at amap_copy+0x16e

uvmfault_amapcopy(cbc98f34,6,0,1,0) at uvmfault_amapcopy+0x128
uvm_fault(cbb952f8,f9eb000,0,2,4811f094) at uvm_fault+0x1b6
trap() at trap+0x4d4
--- trap (number 6) ---
0x481112e4:


and

uvm_pagealloc_strat(0,c7e000,0,0,1,0,0,1727) at uvm_pagealloc_strat+0x141
uvm_km_kmemalloc(c04fc8e0,0,49000,1,cba5fd48) at uvm_km_kmemalloc+0xbd
malloc(49000,52,1,0,cb9c0478) at malloc+0x249
amap_copy(cb9c0478,cba465d0,1,1,1a2bb000,1a2bb001,cbbf2f48,202) at amap_copy+0x1
6e
uvmfault_amapcopy(cbbf2f34,6,0,1,0) at uvmfault_amapcopy+0x128
uvm_fault(cb9c0478,1a2bb000,0,2,4811f094) at uvm_fault+0x1b6
trap() at trap+0x4d4
--- trap (number 6) ---
0x481112e4:


I cought the last one on systat vm and top:

    2 users    Load  2.68  1.96  1.05                  Wed May 14 05:57:51

          memory totals (in KB)             PAGING   SWAPPING      Interrupts
         real   virtual    free             in  out   in  out      1425 total
Active  67988    374524    1020     ops   1320    3                 100 irq0
All    123180    429716  469092     pages        20                     irq4
                                                                        irq6
Proc:r  d  s  w    Csw   Trp   Sys  Int  Sof   Flt        forks         irq10
        1  9      1344  2651    40 1426 1432  1330        fkppw       2 irq11
                                                          fksvm    1323 irq12
   6.4% Sy  20.8% Us   0.0% Ni   2.0% In  70.9% Id        pwait         irq15
|    |    |    |    |    |    |    |    |    |    |  1320 relck
===>>>>>>>>>>>%                                      1320 rlkok
                                                          noram
Namei         Sys-cache     Proc-cache                    ndcpy
    Calls     hits    %     hits     %                    fltcp
                                                          zfod
                                                          cow
Discs  cd0  wd0  wd1  fd0  md0                         64 fmin
seeks                                                  85 ftarg
xfers       659  664                                 8596 itarg
Kbyte      2671 2704                                  436 wired
%busy      45.7 28.7                                 1308 pdfre


load averages:  2.71,  1.97,  1.06                                     05:57:55
62 processes:  1 runnable, 60 sleeping, 1 on processor
CPU states:  3.0% user,  0.0% nice,  6.5% system,  2.5% interrupt, 88.1% idle
Memory: 67M Act, 34M Inact, 1744K Wired, 2408K Exec, 1868K File, 300K Free
Swap: 756M Total, 299M Used, 457M Free

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
 2012 root      -5    0   290M   96M RUN        2:20 25.68% 25.68% rsync
    6 root     -18    0     0K   14M pgdaemon   0:12  0.44%  0.44% [pagedaemon]
    8 root      18    0     0K   14M syncer     0:24  0.00%  0.00% [ioflush]
 1009 root      28    0   244K  804K CPU        0:16  0.00%  0.00% top
 1008 root       3    0   472K  676K ttyin      0:07  0.00%  0.00% systat
  315 andreas    2    0   524K 1308K select     0:06  0.00%  0.00% sshd
  243 root       2    0  4908K 1200K select     0:06  0.00%  0.00% squid
  238 root       2    0   380K    4K select     0:05  0.00%  0.00% <sshd>
  293 andreas    2    0   524K 1308K select     0:03  0.00%  0.00% sshd
  219 root      18  -12   716K 1552K pause      0:02  0.00%  0.00% ntpd
  121 root      10    0   672K  468K nanoslee   0:01  0.00%  0.00% ipmon
 2015 root       2    0   476K    4K select     0:01  0.00%  0.00% <rcmd>
    9 root     -18    0     0K   14M aiodoned   0:01  0.00%  0.00% [aiodoned]
  377 root      18    0   540K    4K pause      0:00  0.00%  0.00% <ksh>
  335 root      18    0   540K    4K pause      0:00  0.00%  0.00% <ksh>
  316 andreas   18    0   504K    4K pause      0:00  0.00%  0.00% <ksh>
  294 andreas   18    0   504K    4K pause      0:00  0.00%  0.00% <ksh>


----
The problem exists in 1.6T as of May 14. The GENERIC kernel locked up
on the second rsync run:

uvm_pagealloc_strat(0,ee0000,0,0,1) at netbsd:uvm_pagealloc_strat+0x153
uvm_km_kmemalloc(c06e6680,0,49000,400000,cb7d1f34) at netbsd:uvm_km_kmemalloc+0x99
malloc(49000,c06b9560,1,0,cb767d84) at netbsd:malloc+0x1d2
amap_copy(cb767d84,cb81d554,1,1,f9db000) at netbsd:amap_copy+0x18d
uvmfault_amapcopy(cb812f34,6,0,1,0) at netbsd:uvmfault_amapcopy+0x128
uvm_fault(cb767d84,f9db000,0,2,4811f094) at netbsd:uvm_fault+0x1b6
trap() at netbsd:trap+0x500
--- trap (number 6) ---
0x481112e4:
db>

NetBSD 1.6.1 (PLANIX) #2: Sun May 11 12:00:04 EDT 2003
    root@wonder.wrede.pvt:/usr/src/sys/arch/i386/compile/PLANIX
cpu0: Intel Pentium III (Coppermine) (686-class), 701.68 MHz
cpu0: I-cache 16 KB 32b/line 4-way, D-cache 16 KB 32b/line 2-way
cpu0: L2 cache 256 KB 32b/line 8-way
cpu0: features 387f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR>
cpu0: features 387f9ff<PGE,MCA,CMOV,FGPAT,PSE36,PN,MMX>
cpu0: features 387f9ff<FXSR,SSE>
cpu0: serial number 0000-0683-0003-017B-4660-0B30
total memory = 127 MB
avail memory = 113 MB
using 1658 buffers containing 6632 KB of memory
BIOS32 rev. 0 found at 0xf04e0
mainbus0 (root)
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: VIA Technologies VT82C691 (Apollo Pro) Host-PCI (rev. 0x42)
agp0 at pchb0: aperture at 0xe4000000, size 0x10000000
ppb0 at pci0 dev 1 function 0: VIA Technologies VT82C598 (Apollo MVP3) CPU-AGP Bridge (rev. 0x00)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled
vga1 at pci1 dev 0 function 0: 3Dfx Interactive Voodoo3 (rev. 0x01)
wsdisplay0 at vga1 kbdmux 1
wsmux1: connecting to wsdisplay0
pcib0 at pci0 dev 7 function 0
pcib0: VIA Technologies VT82C596A (Apollo Pro) PCI-ISA Bridge (rev. 0x12)
pciide0 at pci0 dev 7 function 1: VIA Technologies VT82C596A (Apollo Pro) ATA66 controller
pciide0: bus-master DMA support present
pciide0: primary channel configured to compatibility mode
pciide0: primary channel ignored (disabled)
pciide0: secondary channel configured to compatibility mode
atapibus0 at pciide0 channel 1: 2 targets
cd0 at atapibus0 drive 0: <MATSHITA CR-571, , 1.0d> type 5 cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 3
pciide0: secondary channel interrupting at irq 15
cd0(pciide0:1:0): using PIO mode 3
uhci0 at pci0 dev 7 function 2: VIA Technologies VT83C572 USB Controller (rev. 0x08)
uhci0: interrupting at irq 14
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: VIA Technologie UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
pchb1 at pci0 dev 7 function 3
pchb1: VIA Technologies product 0x3050 (rev. 0x20)
eap0 at pci0 dev 9 function 0: Ensoniq AudioPCI 97 ES1373B (rev. 0x06)
eap0: interrupting at irq 14
eap0: Crystal CS4297 codec; headphone, 18 bit DAC, 18 bit ADC, no 3D stereo
audio0 at eap0: full duplex, mmap, independent
midi0 at eap0: AudioPCI MIDI UART
pciide1 at pci0 dev 10 function 0: Promise Ultra100TX2/ATA Bus Master IDE Accelerator (rev. 0x01)
pciide1: bus-master DMA support present
pciide1: primary channel configured to native-PCI mode
pciide1: using irq 12 for native-PCI interrupt
wd0 at pciide1 channel 0 drive 0: <WDC WD200BB-00AUA1>
wd0: drive supports 16-sector PIO transfers, LBA addressing
wd0: 19092 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 39102336 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
pciide1: secondary channel configured to native-PCI mode
wd1 at pciide1 channel 1 drive 0: <IC35L120AVVA07-0>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 115 GB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 241254720 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(pciide1:1:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA data transfers)
vr0 at pci0 dev 11 function 0: VIA VT3043 (Rhine) 10/100 Ethernet
vr0: interrupting at irq 10
vr0: Ethernet address: 00:50:ba:aa:23:6f
ukphy0 at vr0 phy 8: Generic IEEE 802.3u media interface
ukphy0: Am79C873 10/100 media interface (OUI 0x000676, model 0x0000), rev. 0
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ex0 at pci0 dev 12 function 0: 3Com 3c905C-TX 10/100 Ethernet with mngmt (rev. 0x74)
ex0: interrupting at irq 11
ex0: MAC address 00:50:da:c6:d3:ef
bmtphy0 at ex0 phy 24: Broadcom 3c905C internal PHY, rev. 6
bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
az0 at isa0 port 0x350: Aztech/PackardBell
radio0 at az0
pcppi0 at isa0 port 0x61
midi1 at pcppi0: PC speaker
spkr0 at pcppi0
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
isapnp0: no ISA Plug 'n Play devices found
apm0 at mainbus0: Power Management spec V1.2
APM power mgmt engage (device 1): power management disabled (0x10f)
biomask f3e7 netmask ffe7 ttymask ffe7
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
boot device: wd0
root on wd0a dumps on wd0b
root file system type: ffs
IP Filter: v3.4.29 initialized.  Default = pass all, Logging = enabled
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)

>How-To-Repeat:
	On a machine with 1.6.1 userland and 1.6T GENERIC kernel, 128Mb memory,
repeatedly run rsync against an large filesystem (62Gb, 600k files)
on a remote machine:
rsync -axH --stats --delete server:/u5 /big1/server/u5
	
>Fix:
	Unknown.  Install more memory?
>Release-Note:
>Audit-Trail:
>Unformatted: