NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/48739: Reproducible panic in ld_virtio.c on NetBSD/amd64 guest running under qemu on CentOS 6.5



>Number:         48739
>Category:       kern
>Synopsis:       Reproducible panic in ld_virtio.c on NetBSD/amd64 guest 
>running under qemu on CentOS 6.5
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 12 13:05:00 +0000 2014
>Originator:     Sean Davis
>Release:        netbsd-6 as of 20140410
>Organization:
tinkering guild
>Environment:
NetBSD ansible.endersgame.net 6.1_STABLE NetBSD 6.1_STABLE (ANSIBLE-$Revision: 
1.44 $) #4: Mon Apr  7 10:07:54 UTC 2014  
root%ansible.endersgame.net@localhost:/mnt/ld2a/build/obj/sys/arch/amd64/compile/ANSIBLE
 amd64

>Description:
Every time I run something IO-heavy (installing world from sets, bonnie++,
extracting a very large tarball, etc) the system will panic. The panics point
to virtio; I have included a collection at the end of this PR.

These panics also take place when the system is set for IDE storage rather than
VirtIO.

In the effort to pin it down, I wrote a small program to malloc and memset 4G.
This reproduces the issue as long as swap is configured - without swap, I see
UVM kill the process as expected and the system remains operational. Relevant
output is below, followed by the kernel configuration currently in use. I have
tried it with GENERIC with the same results. I've not been able to get a crash
dump, but would be happy to provide any requested information from DDB as long
as somebody can point me to the right commands.

Note: the dmesg output shows 4094 MB RAM, as I configured it to have 4095 in
the hypervisor to see if that made a difference. When the system is configured
for only 1024MB RAM, this does not happen. At first I attributed this to the
"Other OS" template specifying a 32-bit bus, but the same thing happens when
run when it is switched to a redhat template which specifies a 64 bit bus.

the qemu version is quite old:
[dive@vmhost1 ~]$ rpm -qa|grep qemu
gpxe-roms-qemu-0.9.7-6.10.el6.noarch
qemu-img-0.12.1.2-2.415.el6_5.6.x86_64
qemu-kvm-0.12.1.2-2.415.el6_5.6.x86_64


1) when no swap is configured:
UVM: pid 958 (nbpanic), uid 0 killed: out of swap

2) when swap is configured:
[ssh session] - note: without being root, I got the UVM kill; hence the sudo.
dive@ansible ~ $ sudo ./nbpanic
trying 4294967296 bytes
malloc(4294967296)
malloc(4294967296) gave us 0x7f7ef7700000
memset(0x7f7ef7700000,1,4294967296)
Connection to ansible closed.

[virtual machine console]
uvm_fault(0xffffffff804887e0, 0xffff800047c23000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff802cc7d4 cs 8 rflags 10206 cr2  ffff800047c23ff8 
cpl 8 rsp fffffe80043f27b8
panic: trap
cpu0: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
ld_virtio_start() at netbsd:ld_virtio_start+0x177
ldstart() at netbsd:ldstart+0x6f
ldstrategy() at netbsd:ldstrategy+0x104
bdev_strategy() at netbsd:bdev_strategy+0x47
spec_strategy() at netbsd:spec_strategy+0x2e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x33
swstrategy() at netbsd:swstrategy+0xc2
bdev_strategy() at netbsd:bdev_strategy+0x47
spec_strategy() at netbsd:spec_strategy+0x2e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x33
uvm_swap_io() at netbsd:uvm_swap_io+0x11f
swapcluster_flush() at netbsd:swapcluster_flush+0x49
uvm_pageout() at netbsd:uvm_pageout+0x31b
cpu0: End traceback...

dump to dev 19,17 not possible
rebooting...
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007, 2008, 2009, 2010, 2011, 2012
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 6.1_STABLE (ANSIBLE-$Revision: 1.44 $) #4: Mon Apr  7 10:07:54 UTC 2014
        
root%ansible.endersgame.net@localhost:/mnt/ld2a/build/obj/sys/arch/amd64/compile/ANSIBLE
total memory = 4094 MB
avail memory = 3970 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
oVirt oVirt Node (6-5.el6.centos.11.2)
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (BOCHSCPU 0.1         )
cpu0 at mainbus0 apid 0: Westmere E56xx/L56xx/X56xx (Nehalem-C), id 0x206c1
cpu1 at mainbus0 apid 1: Westmere E56xx/L56xx/X56xx (Nehalem-C), id 0x206c1
cpu2 at mainbus0 apid 2: Westmere E56xx/L56xx/X56xx (Nehalem-C), id 0x206c1
cpu3 at mainbus0 apid 3: Westmere E56xx/L56xx/X56xx (Nehalem-C), id 0x206c1
mpbios: bus 0 is type PCI   
mpbios: bus 1 is type ISA   
ioapic0 at mainbus0 apid 0: pa 0xfec00000, version 11, 24 pins
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x1237 (rev. 0x02)
pcib0 at pci0 dev 1 function 0: vendor 0x8086 product 0x7000 (rev. 0x00)
piixide0 at pci0 dev 1 function 1: Intel 82371SB IDE Interface (PIIX3) (rev. 
0x00)
piixide0: bus-master DMA support present
piixide0: primary channel wired to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14
atabus0 at piixide0 channel 0
piixide0: secondary channel wired to compatibility mode
piixide0: secondary channel interrupting at ioapic0 pin 15
atabus1 at piixide0 channel 1
vendor 0x8086 product 0x7020 (USB serial bus, revision 0x01) at pci0 dev 1 
function 2 not configured
piixpm0 at pci0 dev 1 function 3: vendor 0x8086 product 0x7113 (rev. 0x03)
timecounter: Timecounter "piixpm0" frequency 3579545 Hz quality 1000
piixpm0: 24-bit timer
piixpm0: interrupting at ioapic0 pin 9
iic0 at piixpm0: I2C bus
vga0 at pci0 dev 2 function 0: vendor 0x1b36 product 0x0100 (rev. 0x04)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
drm at vga0 not configured
virtio0 at pci0 dev 3 function 0
virtio0: Virtio Network Device (rev. 0x00)
vioif0 at virtio0: Ethernet address 00:1a:4a:10:d4:25
virtio0: allocated 20480 byte for virtqueue 0 for rx, size 256
virtio0: using 8192 byte (512 entries) indirect descriptors
virtio0: allocated 81920 byte for virtqueue 1 for tx, size 256
virtio0: using 69632 byte (4352 entries) indirect descriptors
virtio0: allocated 8192 byte for virtqueue 2 for control, size 64
virtio0: interrupting at ioapic0 pin 11
virtio1 at pci0 dev 4 function 0
virtio1: Virtio Console Device (rev. 0x00)
virtio1: no matching child driver; not configured
virtio2 at pci0 dev 5 function 0
virtio2: Virtio Block Device (rev. 0x00)
ld0 at virtio2
virtio2: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio2: using 36864 byte (2304 entries) indirect descriptors
ld0: 200 GB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 419430400 sectors
virtio2: interrupting at ioapic0 pin 10
virtio3 at pci0 dev 6 function 0
virtio3: Virtio Block Device (rev. 0x00)
ld1 at virtio3
virtio3: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio3: using 36864 byte (2304 entries) indirect descriptors
ld1: 20480 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 41943040 sectors
virtio3: interrupting at ioapic0 pin 10
virtio4 at pci0 dev 7 function 0
virtio4: Virtio Block Device (rev. 0x00)
ld2 at virtio4
virtio4: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio4: using 36864 byte (2304 entries) indirect descriptors
ld2: 40960 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 83886080 sectors
virtio4: interrupting at ioapic0 pin 11
virtio5 at pci0 dev 8 function 0
virtio5: Virtio Network Device (rev. 0x00)
vioif1 at virtio5: Ethernet address 00:1a:4a:10:d4:0e
virtio5: allocated 20480 byte for virtqueue 0 for rx, size 256
virtio5: using 8192 byte (512 entries) indirect descriptors
virtio5: allocated 81920 byte for virtqueue 1 for tx, size 256
virtio5: using 69632 byte (4352 entries) indirect descriptors
virtio5: allocated 8192 byte for virtqueue 2 for control, size 64
virtio5: interrupting at ioapic0 pin 11
virtio6 at pci0 dev 9 function 0
virtio6: Virtio Block Device (rev. 0x00)
ld3 at virtio6
virtio6: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio6: using 36864 byte (2304 entries) indirect descriptors
ld3: 40960 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 83886080 sectors
virtio6: interrupting at ioapic0 pin 10
virtio7 at pci0 dev 10 function 0
virtio7: Virtio Block Device (rev. 0x00)
ld4 at virtio7
virtio7: allocated 45056 byte for virtqueue 0 for I/O request, size 128
virtio7: using 36864 byte (2304 entries) indirect descriptors
ld4: 40960 MB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 83886080 sectors
virtio7: interrupting at ioapic0 pin 10
isa0 at pcib0
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
attimer0 at isa0 port 0x40-0x43
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
atapibus at piixide0 not configured
boot device: ld1
root on ld1a dumps on ld1b
/: replaying log to memory
root file system type: ffs
/: replaying log to disk
/mnt/wd-spindle-0: replaying log to disk
/mnt/pliant-ssd-0: replaying log to disk
/mnt/ocz-ssd-0: replaying log to disk
/mnt/wd-spindle-1: replaying log to disk
Accounting started



Another, from when it was running bonnie++ rather than my program:
uvm_fault(0xffffffff80e0ce20, 0xffff80008e8ec000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff8084c104 cs 8 rflags 10206 cr2  ffff80008e8ecff8 
cpl 8 rsp fffffe810f8ee3d8
panic: trap
cpu0: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
ld_virtio_start() at netbsd:ld_virtio_start+0x177
ldstart() at netbsd:ldstart+0x6f
ldstrategy() at netbsd:ldstrategy+0x104
bdev_strategy() at netbsd:bdev_strategy+0x47
spec_strategy() at netbsd:spec_strategy+0x2e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x33
genfs_do_io() at netbsd:genfs_do_io+0x1a6
genfs_gop_write() at netbsd:genfs_gop_write+0x55
genfs_do_putpages() at netbsd:genfs_do_putpages+0xbe5
VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x3a
ffs_write() at netbsd:ffs_write+0x2f9
VOP_WRITE() at netbsd:VOP_WRITE+0x37
vn_write() at netbsd:vn_write+0xf9
dofilewrite() at netbsd:dofilewrite+0x7d
sys_write() at netbsd:sys_write+0x62
syscall() at netbsd:syscall+0xc4
cpu0: End traceback...

I tried with and without ACPI, the trace was the same.

Kernel config:
# NetBSD 6 oVirt/KVM Kernel Configuration
#
# Minimal configuration.
#
# $egnet: ANSIBLE,v 1.44 2014/04/07 10:04:55 dive Exp $

machine amd64 x86

ident "ANSIBLE-$Revision: 1.44 $"

maxusers 64

makeoptions COPTS="-O2 -fno-omit-frame-pointer"

### BEGIN XXX
makeoptions DEBUG="-g"
options DDB
options DDB_HISTORY_SIZE=1024
options DDB_COMMANDONENTER="trace;show registers"
options INSECURE
### END XXX

options AIO
options BUFQ_FCFS
options BUFQ_DISKSORT
options COMPAT_43
options COREDUMP
options CPU_IN_CKSUM
options EXEC_ELF64
options EXEC_SCRIPT
options FILEASSOC
options HOSTZEROBROADCAST=0
options INET
options IPFILTER_LOG
options MPBIOS
options MPBIOS_SCANPCI
options MQUEUE
options MTRR
options MULTIPROCESSOR
options NTP
options P1003_1B_SEMAPHORE
options PAX_ASLR=0
options PAX_MPROTECT=0
options PCKBD_CNATTACH_MAY_FAIL
options PFIL_HOOKS
options PTRACE
options RFC2292
options RTC_OFFSET=0
options SCHED_4BSD
options SYSVMSG
options SYSVSEM
options SYSVSHM
options USER_VA0_DISABLE_DEFAULT=1
options VCONS_DRAW_INTR
options VERIFIED_EXEC_FP_SHA512
options VERIFIED_EXEC_FP_SHA256
options VGA_POST
options VMSWAP
options WAPBL
options WSDISPLAY_COMPAT_PCVT
options WSDISPLAY_COMPAT_SYSCONS
options WSEMUL_VT100
options WS_KERNEL_FG=WSCOL_GREEN
options secmodel_bsd44

file-system FFS
file-system PTYFS
file-system UNION

config netbsd root on ? type ?

mainbus0 at root
cpu* at mainbus?
ioapic* at mainbus? apid ?
pci* at mainbus? bus ?
pci* at pchb? bus ?
pchb* at pci? dev ? function ?
pcib* at pci? dev ? function ?
isa0 at pcib?
com0 at isa? port 0x3f8 irq 4
com1 at isa? port 0x2f8 irq 3
pckbc* at isa?
pckbd* at pckbc?
pms* at pckbc?
vga* at pci? dev ? function ?
wsdisplay* at vga? console ?
wsdisplay* at wsemuldisplaydev?
wskbd* at pckbd? console ?
wsmouse* at pms? mux 0
attimer0 at isa?
piixpm* at pci? dev ? function ?
iic* at piixpm?
virtio* at pci? dev ? function ?
viomb* at virtio?
ld* at virtio?
vioif* at virtio?
piixide* at pci? dev ? function ? flags 0x0000
atabus* at piixide? channel ?
wd* at atabus? drive ? flags 0x0000

pseudo-device accf_data
pseudo-device accf_http
pseudo-device bpfilter
pseudo-device bridge
pseudo-device clockctl
pseudo-device cpuctl
pseudo-device crypto
pseudo-device drvctl
pseudo-device fss
pseudo-device ipfilter
pseudo-device ksyms
pseudo-device loop
pseudo-device pty
pseudo-device rnd
pseudo-device swcrypto
pseudo-device tap
pseudo-device tun
pseudo-device veriexec 1
pseudo-device wsfont
pseudo-device wsmux

>How-To-Repeat:
enable a swap partition, do something that causes memory usage to require swap. 
I don't think that's the only case that triggers it, but it's the one I can 
reproduce.

A simple C program to malloc 4GB and then memset it to 1 reproduces this on my 
test system, but only with a swap device configured - without one, UVM kills 
the process as expected.
>Fix:
None known; Running without swap seems to avoid it, and it "feels" like it's 
most likely with amounts of RAM near or above 4GB: happens more on 8G than 4G 
minus 1MB, and doesn't seem to happen on 1G.



Home | Main Index | Thread Index | Old Index