Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Large file copy seems to cause dom0 kernel panic
On Oct 15, 2011, at 3:31 PM, Steven Senator wrote:
> I saw and reported this ~1 year ago. ( See:
> http://mail-index.netbsd.org/port-xen/2010/07/27/msg006181.html It is
> the same kernel stack trace. I only saw problems when doing
> dd if=/dev/zero of=file-backed-domu-boot-disk-image bs=1048576 count=8192
> within a domU. I could not reproduce this in the GENERIC kernel. My
> motherboard is an Opteron SuperMicro H8QM8-2+ (multiprocessor, high
> memory=24Gb). Also, I was using solid state disks (Intel X-25.) My
> suspicion is that because these "disks" are fast, with a high memory
> footprint the VM system was running into edge cases where pages got
> pushed out to disk faster than with traditional spinning platters, and
> there was probably a missing lock that wasn't exposed normally with
> slower i/O. Unfortunately, when I get a panic there is a secondary
> panic which prevents the dump from happening so i could only get a
> screenshot.
>
> I can provide remote access to the system if it would be helpful.
>
> -Steve Senator
>
>
> On Sat, Oct 15, 2011 at 4:41 AM, Stephen M. Jones <smj%cirr.com@localhost>
> wrote:
>>> please enable ddb (sysctl -w ddb.onpanic=1) and report the real stack
>>> trace.
>>>
>>> I've never noticed this mysef, although I occasionally do large files
>>> copy in dom0.
>>
>> uvm_fault(0xffffffff80bfffc0, 0xffffffff81400000, 1) -> e
>> fatal page fault in supervisor mode
>> trap type 6 code 0 rip ffffffff804fb673 cs e030 rflags 10286 cr2
>> ffffffff81400028 cpl 0 rsp ffffa0006ce638d0
>> kernel: page fault trap, code=0
>> Stopped in pid 4889.1 (cp) at netbsd:pmap_kenter_pa+0x173: movq
>> 0(%rax),
>> %rsi
>> pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x173
>> ubc_alloc() at netbsd:ubc_alloc+0x25d
>> ubc_uiomove() at netbsd:ubc_uiomove+0xba
>> ffs_write() at netbsd:ffs_write+0x5c2
>> VOP_WRITE() at netbsd:VOP_WRITE+0x2d
>> vn_write() at netbsd:vn_write+0xce
>> dofilewrite() at netbsd:dofilewrite+0x7f
>> sys_write() at netbsd:sys_write+0x72
>> syscall() at netbsd:syscall+0xb4
>> ds 0
>> es 0x3920
>> fs 0
>> gs 0xdd38
>> rdi 0xffffa0006752c000
>> rsi 0xcbe05000
>> rbp 0xffffa0006ce63900
>> rbx 0xcbe05
>> rdx 0x7f8000000000
>> rcx 0
>> rax 0xffffffff81400028
>> r8 0xffffffff80bab900 cpu_info_primary
>> r9 0xffffa0000615d9c0
>> r10 0xffffa00007cdf160
>> r11 0xffffa0006ce63920
>> r12 0x3
>> r13 0x7fd00033a960
>> r14 0xffffa00067a9dd38
>> r15 0xffffa0006752c000
>> rip 0xffffffff804fb673 pmap_kenter_pa+0x173
>> cs 0xe030
>> rflags 0x10286
>> rsp 0xffffa0006ce638d0
>> ss 0xe02b
>> netbsd:pmap_kenter_pa+0x173: movq 0(%rax),%rsi
>> db>
>> db> bt
>> pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x173
>> ubc_alloc() at netbsd:ubc_alloc+0x25d
>> ubc_uiomove() at netbsd:ubc_uiomove+0xba
>> ffs_write() at netbsd:ffs_write+0x5c2
>> VOP_WRITE() at netbsd:VOP_WRITE+0x2d
>> vn_write() at netbsd:vn_write+0xce
>> dofilewrite() at netbsd:dofilewrite+0x7f
>> sys_write() at netbsd:sys_write+0x72
>> syscall() at netbsd:syscall+0xb4
>> db> trace
>> pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x173
>> ubc_alloc() at netbsd:ubc_alloc+0x25d
>> ubc_uiomove() at netbsd:ubc_uiomove+0xba
>> ffs_write() at netbsd:ffs_write+0x5c2
>> VOP_WRITE() at netbsd:VOP_WRITE+0x2d
>> vn_write() at netbsd:vn_write+0xce
>> dofilewrite() at netbsd:dofilewrite+0x7f
>> sys_write() at netbsd:sys_write+0x72
>> syscall() at netbsd:syscall+0xb4
>> db> reboot
>> syncing disks... 12 11 done
>> unmounting file systems...
>> unmounting /proc (procfs)...uvm_fault(0xffffffff80bfffc0,
>> 0xffffffff81400000, 1) -> e
>> fatal page fault in supervisor mode
>> trap type 6 code 0 rip ffffffff804fb673 cs e030 rflags 10282 cr2
>> ffffffff814000a8 cpl 6 rsp ffffa0006ce630e0
>> kernel: page fault trap, code=0
>> Stopped in pid 4889.1 (cp) at netbsd:pmap_kenter_pa+0x173: movq
>> 0(%rax),
>> %rsi
>> pmap_kenter_pa() at netbsd:pmap_kenter_pa+0x173
>> uvm_km_alloc() at netbsd:uvm_km_alloc+0x169
>> pool_grow() at netbsd:pool_grow+0x36
>> pool_get() at netbsd:pool_get+0x68
>> pool_cache_put_slow() at netbsd:pool_cache_put_slow+0x1d0
>> pool_cache_put_paddr() at netbsd:pool_cache_put_paddr+0xe1
>> vnfree() at netbsd:vnfree+0x5b
>> vrelel() at netbsd:vrelel+0x3f9
>> vflush() at netbsd:vflush+0x2d7
>> procfs_unmount() at netbsd:procfs_unmount+0x2b
>> dounmount() at netbsd:dounmount+0xd5
>> vfs_unmountall() at netbsd:vfs_unmountall+0x7c
>> cpu_reboot() at netbsd:cpu_reboot+0xe1
>> db_reboot_cmd() at netbsd:db_reboot_cmd+0x47
>> db_command() at netbsd:db_command+0xb0
>> db_command_loop() at netbsd:db_command_loop+0xe9
>> db_trap() at netbsd:db_trap+0xdd
>> kdb_trap() at netbsd:kdb_trap+0xc2
>> trap() at netbsd:trap+0x345
>>
So this may be drive specific? I recently swapped out a ST3250620AS for a
HDS725050KLA360
and began to have this problem.
I did not see large file copy panics in single user mode, which is how I
migrated the data off
of the ST3250620AS, but it definitely is a repeatable case that a non-root user
can cause a
kernel panic by doing a large file copy on the HDS725050KLA360 disk.
Here is the output of atactl identify, but I'm not sure there is anything
useful in it.
Model: ST3250620AS, Rev: 3.AAK, Serial #: 9QE20PS
Device type: ATA, fixed
Cylinders: 16383, heads: 16, sec/track: 63, total sectors: 268435455
Device supports command queue depth of 31
Device capabilities:
DMA
LBA
ATA standby timer values
IORDY operation
IORDY disabling
Device supports following standards:
ATA-1 ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 ATA-7
Command set support:
READ BUFFER command (enabled)
WRITE BUFFER command (enabled)
Host Protected Area feature set (enabled)
look-ahead (enabled)
write cache (enabled)
Power Management feature set (enabled)
Security Mode feature set (disabled)
SMART feature set (enabled)
FLUSH CACHE EXT command (enabled)
FLUSH CACHE command (enabled)
Device Configuration Overlay feature set (enabled)
48-bit Address feature set (enabled)
SET MAX security extension (disabled)
DOWNLOAD MICROCODE command (enabled)
General Purpose Logging feature set
SMART self-test
SMART error logging
Serial ATA capabilities:
1.5Gb/s signaling
3.0Gb/s signaling
Native Command Queuing
PHY Event Counters
Serial ATA features:
Device-Initiated Interface Power Managment (disabled)
Software Settings Preservation (enabled)
---
Model: HDS725050KLA360, Rev: K2AOAB0, Serial #: KRVN65ZBHBW54
Device type: ATA, fixed
Cylinders: 16383, heads: 16, sec/track: 63, total sectors: 268435455
Device supports command queue depth of 31
Device capabilities:
DMA
LBA
ATA standby timer values
IORDY operation
IORDY disabling
Device supports following standards:
ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 ATA-7
Command set support:
READ BUFFER command (enabled)
WRITE BUFFER command (enabled)
Host Protected Area feature set (enabled)
look-ahead (enabled)
write cache (enabled)
Power Management feature set (enabled)
Security Mode feature set (disabled)
SMART feature set (enabled)
FLUSH CACHE EXT command (enabled)
FLUSH CACHE command (enabled)
Device Configuration Overlay feature set (enabled)
48-bit Address feature set (enabled)
Automatic Acoustic Management feature set (disabled)
SET MAX security extension (disabled)
SET FEATURES required to spin-up after power-up (disabled)
Power-Up In Standby feature set (disabled)
Advanced Power Management feature set (disabled)
DOWNLOAD MICROCODE command (enabled)
URG bit for WRITE STREAM DMA/PIO
URG bit for READ STREAM DMA/PIO
World Wide name
WRITE DMA/MULTIPLE FUA EXT commands
General Purpose Logging feature set
Streaming feature set
SMART self-test
SMART error logging
Serial ATA capabilities:
1.5Gb/s signaling
Native Command Queuing
Host-Initiated Interface Power Management
Serial ATA features:
Non-zero Offset DMA (disabled)
DMA Setup Auto Activate (disabled)
Device-Initiated Interface Power Managment (disabled)
In-order Data Delivery (disabled)
Software Settings Preservation (enabled)
Home |
Main Index |
Thread Index |
Old Index