Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: NetBSD DomU MP freeze under Linux Dom0
Manuel Bouyer wrote:
> On Thu, Sep 06, 2012 at 12:57:19PM +0200, Roger Pau Monne wrote:
>> Hello,
>>
>> Recently I've been doing some benchmarks on NetBSD, to compare the
>> performances of both NetBSD and Linux as Dom0/DomUs (this was presented
>> on XenSummit last week with Cherry G. Mathew, slides will probably be
>> uploaded soon).
>>
>> One of the benchmarks consisted in running build.sh inside a DomU, and
>> during this test I've realised that this lead to a freeze when running a
>> Linux Dom0 and a NetBSD DomU with 4vcpus. So far I haven't been able to
>> reproduce the problem without MP or in a NetBSD Dom0, which is kind of
>> strange, because I would say it is not related to blkfront, I've added
>> some debugging prints there, and blkfront seems to not be the owner of
>> the lock when the freeze happens. The build of NetBSD inside the DomU
>> was using 8 simultaneous jobs, and it freezes to a point where I can not
>> even access ddb. I've been able to get a trace using gdbsx:
>>
>> Thread 4:
>>
>> #0 0xffffffff80101248 in hypercall_page ()
>> #1 0x000000000000e033 in ?? ()
>> #2 0x0000000000000000 in ?? ()
>>
>> Thread 3:
>>
>> #0 0xffffffff80130f32 in x86_pause ()
>> #1 0xffffffff801f67b1 in _kernel_lock ()
>> #2 0xffffffff8030b054 in bdev_strategy ()
>> #3 0xffffffff803037d8 in spec_strategy ()
>> #4 0xffffffff803a719a in VOP_STRATEGY ()
>> #5 0xffffffff8035ff7a in ufs_strategy ()
>> #6 0xffffffff803a719a in VOP_STRATEGY ()
>> #7 0xffffffff8038d3fa in bwrite ()
>> #8 0xffffffff803a6320 in VOP_BWRITE ()
>> #9 0xffffffff80357125 in ufs_dirremove ()
>> #10 0xffffffff8035dc47 in ufs_remove ()
>> #11 0xffffffff803a6b53 in VOP_REMOVE ()
>> #12 0xffffffff8039ac4f in do_sys_unlink ()
>> #13 0xffffffff8032b044 in syscall ()
>> #14 0xffffffff8010221d in Xsyscall ()
>>
>> Thread 2:
>>
>> #0 0xffffffff801f67b1 in _kernel_lock ()
>> #1 0xffffffff8030b054 in bdev_strategy ()
>> #2 0xffffffff803037d8 in spec_strategy ()
>> #3 0xffffffff803a719a in VOP_STRATEGY ()
>> #4 0xffffffff8035ff7a in ufs_strategy ()
>> #5 0xffffffff803a719a in VOP_STRATEGY ()
>> #6 0xffffffff8038d3fa in bwrite ()
>> #7 0xffffffff803a6320 in VOP_BWRITE ()
>> #8 0xffffffff80357125 in ufs_dirremove ()
>> #9 0xffffffff8035dc47 in ufs_remove ()
>> #10 0xffffffff803a6b53 in VOP_REMOVE ()
>> #11 0xffffffff8039ac4f in do_sys_unlink ()
>> #12 0xffffffff8032b044 in syscall ()
>> #13 0xffffffff8010221d in Xsyscall ()
>>
>> Thread 1:
>>
>> #0 0xffffffff801f67b1 in _kernel_lock ()
>> #1 0xffffffff8030b054 in bdev_strategy ()
>> #2 0xffffffff803037d8 in spec_strategy ()
>> #3 0xffffffff803a719a in VOP_STRATEGY ()
>> #4 0xffffffff8035ff7a in ufs_strategy ()
>> #5 0xffffffff803a719a in VOP_STRATEGY ()
>> #6 0xffffffff8038d3fa in bwrite ()
>> #7 0xffffffff803a6320 in VOP_BWRITE ()
>> #8 0xffffffff80357125 in ufs_dirremove ()
>> #9 0xffffffff8035dc47 in ufs_remove ()
>> #10 0xffffffff803a6b53 in VOP_REMOVE ()
>> #11 0xffffffff8039ac4f in do_sys_unlink ()
>> #12 0xffffffff8032b044 in syscall ()
>> #13 0xffffffff8010221d in Xsyscall ()
>>
>> My guess is that Thread 4 is holding the lock, and it's blocked for some
>> reason that's beyond my current knowledge of NetBSD internals, and the
>> stack trace is not helping on that.
>
> Do you have a way to know what hypercall thread 4 is doing ?
> it looks like it's doing an hypercall with the kernel_lock held,
> and this hypercall blocks.
I'm not so sure this is related to Xen, I've been trying to debug this,
in the case above the hypercall was a do_console_io, but I've been
having a lot more of this crashes, and they all seem to be related to
the filesystem (probably related to the bug that I've emailed to
tech-kern "Panic when deleting large number of files inside DomU").
Here is another crash, this time the hypercall is a do_sched_op_compat:
Thread 4:
#0 0xffffffff801010ca in hypercall_page ()
#1 0xffffffff807db030 in ?? ()
#2 0x0000000000000001 in ?? ()
#3 0xffffffff803b03ee in xenconscn_getc ()
#4 0xffffffff8013be10 in db_readline ()
#5 0xffffffff8013c934 in db_read_line ()
#6 0xffffffff80139eb5 in db_command_loop ()
#7 0xffffffff8013f43d in db_trap ()
#8 0xffffffff8013c7da in kdb_trap ()
#9 0xffffffff8034a525 in trap ()
#10 0xffffffff8010340f in calltrap ()
#11 0xffffffff80130bf5 in breakpoint ()
#12 0xffffffff803172f1 in vpanic ()
#13 0xffffffff80317410 in panic ()
#14 0xffffffff803a2ae6 in wapbl_register_deallocation ()
#15 0xffffffff8015ef1b in ffs_indirtrunc ()
#16 0xffffffff8015eec2 in ffs_indirtrunc ()
#17 0xffffffff8015eec2 in ffs_indirtrunc ()
#18 0xffffffff8016007f in ffs_truncate ()
#19 0xffffffff803575ef in ufs_inactive ()
#20 0xffffffff803a817d in VOP_INACTIVE ()
#21 0xffffffff8039f28c in vrelel ()
#22 0xffffffff8039c31c in do_sys_stat ()
#23 0xffffffff8039c3c9 in sys___lstat50 ()
#24 0xffffffff8032c2e4 in syscall ()
#25 0xffffffff8010221d in Xsyscall ()
Thread 3:
#0 0xffffffff8013c58f in ddb_suspend ()
#1 0xffffffff8013c898 in ddb_ipi ()
#2 0xffffffff803abae6 in xen_ipi_ddb ()
#3 0xffffffff803aba91 in xen_ipi_handler ()
#4 0xffffffff8014bc9b in evtchn_do_event ()
#5 0xffffffff801027ed in call_evtchn_do_event ()
#6 0xffffffff8017b76d in do_hypervisor_callback ()
#7 0xffffffff80105bae in hypervisor_callback ()
#8 0x00000000deadbeef in ?? ()
#9 0x00000000deadbeef in ?? ()
#10 0x0000000000000000 in ?? ()
Thread 2:
#0 0xffffffff8013c58f in ddb_suspend ()
#1 0xffffffff8013c898 in ddb_ipi ()
#2 0xffffffff803abae6 in xen_ipi_ddb ()
#3 0xffffffff803aba91 in xen_ipi_handler ()
#4 0xffffffff8014bc9b in evtchn_do_event ()
#5 0xffffffff801027ed in call_evtchn_do_event ()
#6 0xffffffff8017b76d in do_hypervisor_callback ()
#7 0xffffffff80105bae in hypervisor_callback ()
#8 0x00000000deadbeef in ?? ()
#9 0x00000000deadbeef in ?? ()
#10 0x0000000000000000 in ?? ()
Thread 1:
#0 0xffffffff8013c58f in ddb_suspend ()
#1 0xffffffff8013c898 in ddb_ipi ()
#2 0xffffffff803abae6 in xen_ipi_ddb ()
#3 0xffffffff803aba91 in xen_ipi_handler ()
#4 0xffffffff8014bc9b in evtchn_do_event ()
#5 0xffffffff801027ed in call_evtchn_do_event ()
#6 0xffffffff8017b76d in do_hypervisor_callback ()
#7 0xffffffff80105bae in hypervisor_callback ()
#8 0x00000000deadbeef in ?? ()
#9 0x00000000deadbeef in ?? ()
#10 0x0000000000000000 in ?? ()
This time I was able to get a ddb session also, here is the output:
panic: wapbl_register_deallocation: out of resources
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80130bf5 cs e030 rflags 246 cr2
7f7ff7b1f000 cpl 0 rsp ffffa0005b03b490
Stopped in pid 1425.1 (find) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
printf_nolog() at netbsd:printf_nolog
wapbl_register_inode() at netbsd:wapbl_register_inode
ffs_indirtrunc() at netbsd:ffs_indirtrunc+0x35b
ffs_indirtrunc() at netbsd:ffs_indirtrunc+0x302
ffs_indirtrunc() at netbsd:ffs_indirtrunc+0x302
ffs_truncate() at netbsd:ffs_truncate+0x9a4
ufs_inactive() at netbsd:ufs_inactive+0x2df
VOP_INACTIVE() at netbsd:VOP_INACTIVE+0x33
vrelel() at netbsd:vrelel+0x1bb
do_sys_stat() at netbsd:do_sys_stat+0x78
sys___lstat50() at netbsd:sys___lstat50+0x26
syscall() at netbsd:syscall+0xc4
ds 4000
es b4d0
fs 100
gs 7500
rdi 0
rsi d
rbp ffffa0005b03b490
rbx 104
rdx 0
rcx 8
rax 1
r8 0
r9 1
r10 0
r11 1180
r12 ffffffff80427780 copyright+0x22f20
r13 ffffa0005b03b4d0
r14 4000
r15 772
rip ffffffff80130bf5 breakpoint+0x5
cs e030
rflags 246
rsp ffffa0005b03b490
ss e02b
The filesystem was clean, since I've just created it with newfs -O 2.
Home |
Main Index |
Thread Index |
Old Index