tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Xen 3.3: Problem HVM guest
On Friday 15 August 2008 10:18:31 Christoph Egger wrote:
> On Thursday 14 August 2008 23:39:23 Christoph Egger wrote:
> > Manuel Bouyer wrote:
> > > On Thu, Aug 14, 2008 at 08:25:14PM +0200, Christoph Egger wrote:
> > >>> Not really, as the write which is failing is also in dom0 (so on the
> > >>> same CPU). I think the tlb should be properly invalidated. Just to
> > >>> make sure you can try adding
> > >>> pmap_tlb_shootdown(pmap, va, 0, opte);
> > >>> just after xpq_update_foreign() in pmap_enter_ma(). But as we're
> > >>> switching pmaps on return to userland, this shouldn't be needed.
> > >>
> > >> This has no impact.
> > >
> > > As expected ... I'm running out of idea. I'll try to reproduce this
> > > on my test box, but it won't be before next week.
> >
> > I found the bug:
> > >>>>> - instrument privpgop_fault() to see if it gets called at all for
> > >>>>> this mapping, and if it's doing the right thing.
> > >>>>> There should be only one page in this object, and the machine
> > >>>>> address should be 0 (pobj->maddr[maddr_i])
> > >>>>
> > >>>> Yes, privpgop_fault() is called. It looks like it's called in a
> > >>>> loop. npages = 1 and machine address is 0.
> > >>>
> > >>> OK, it has the right data. I guess it's called in a loop because
> > >>> writing at the page keeps failing.
> >
> > Writing at the page keeps failing because privpgop_fault()
> > does not handle this case:
> >
> > if (pobj->maddr[maddr_i] == 0)
> > continue; /* this has already been flagged as error */
> >
> > Removing this makes privpgop_fault() calling pmap_enter_ma()
> > and that makes the write access finally succeed and the HVM guest
> > starts.
> >
> > May I commit this change?
>
> The story is not over yet. When running a HVM guest, the machine
> suddenly freezes with this message:
>
> Mutex error: mutex_spin_retry: locking against myself
>
> lock address : 0xffffffff80b86a80
> current cpu : 0
> current lwp : 0xffffa000257e47e0
> owner field : 0x0000000000010700 wait/spin: 0/1
>
> The machine freezes absolutely: No keyboard interrupt, no serial console
> and no network is working. The machine can't be pinged from outside.
>
>
> What I figured out so far:
>
> a) I can only reproduce this with / on nfs. (So is this NetBSD/Xen
> specific? ) b) The values are always the same.
A LOCKDEBUG Dom0 kernel panics with this:
Mutex error: mutex_vector_enter: locking against myself
lock address : 0xffffa00023206f48 type : spin
initialized : 0xffffffff803f5dee
shared holds : 0 exclusive: 0
shares wanted: 0 exclusive: 1
current cpu : 0 last held: 0
current lwp : 0xffffa000260247c0 last held: 000000000000000000
last locked : 0xffffffff80407e00 unlocked : 0xffffffff80407e7e
owner field : 0x0000000000010700 wait/spin: 0/1
panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff804b936d cs e030 rflags 246 cr2
ffffa000256d5000 cpl 8 rsp ffffa000260b7088
Stopped in pid 457.1 (qemu-dm) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x255
lockdebug_abort1() at netbsd:lockdebug_abort1+0xd3
mutex_vector_enter() at netbsd:mutex_vector_enter+0x1f0
sleepq_remove() at netbsd:sleepq_remove+0x107
cv_wakeup_all() at netbsd:cv_wakeup_all+0x81
knote_activate() at netbsd:knote_activate+0x84
knote() at netbsd:knote+0x36
selnotify() at netbsd:selnotify+0x25
logwakeup() at netbsd:logwakeup+0x3f
printf() at netbsd:printf+0xfc
xen_correctable_handler() at netbsd:xen_correctable_handler+0x25
Xresume_xenev8() at netbsd:Xresume_xenev8+0x55
--- interrupt ---
Xspllower() at netbsd:Xspllower+0xe
mi_switch() at netbsd:mi_switch+0x12e
sleepq_block() at netbsd:sleepq_block+0xa0
selcommon() at netbsd:selcommon+0x738
sys_select() at netbsd:sys_select+0x6a
syscall() at netbsd:syscall+0x98
ds 0x7414
es 0x7098
fs 0x7414
gs 0x8
rdi 0x8
rsi 0xdeadbeef
rbp 0xffffa000260b7088
rbx 0xffffa000260b7098
rdx 0
rcx 0
rax 0x1
r8 0xffffa000260b6fa8
r9 0x1
r10 0xffffa000260b6fa8
r11 0xffffffff804f5560 xenconscn_putc
r12 0x100
r13 0xffffffff80867414 copyright+0x1a254
r14 0x8
r15 0x1
rip 0xffffffff804b936d breakpoint+0x5
cs 0xe030
rflags 0x246
rsp 0xffffa000260b7088
ss 0xe02b
netbsd:breakpoint+0x5: leave
db> ps /l
PID LID S FLAGS STRUCT LWP * NAME WAIT
>457 3 3 1000084 ffffa00025d067c0 qemu-dm aiowork
2 3 1000084 ffffa00026024000 qemu-dm netio
> 1 3 40084 ffffa000260247c0 qemu-dm select
[...]
Christoph
Home |
Main Index |
Thread Index |
Old Index