Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Domains stuck in "shutdown" state
I've discovered that, by shutting down a lot of domains at once, I can
get some stuck in a state like this:
Name Id Mem(MB) CPU State Time(s) Console
Domain-0 0 127 0 r---- 29.0
unreal02 3 0 0 ---s- 4.1 9603
Attempts to "xm destroy" the domain do nothing. That one zombie was
obtained by shutting down 12 guests on my earlier-mentioned P4 with
hyperthreading disabled; if I do that with HT on, I generally wind up
with 11 of the zombies. With HT on and "sleep 25" in between the "xm
shutdown"s, maybe 6 of them.
Meanwhile, every two seconds, xend logs this:
[2005-06-23 19:54:09 xend] DEBUG (XendDomain:244) XendDomain>reap> domain
died name=unreal02 id=3
[2005-06-23 19:54:09 xend] DEBUG (XendDomain:247) XendDomain>reap> shutdown
id=3 reason=poweroff
[2005-06-23 19:54:09 xend] DEBUG (XendDomain:487) domain_restart_schedule> 3
poweroff 0
[2005-06-23 19:54:09 xend] INFO (XendDomain:564) Destroying domain:
name=unreal02
[2005-06-23 19:54:09 xend] DEBUG (XendDomainInfo:634) Closing console, domain
3
[2005-06-23 19:54:09 xend] INFO (XendRoot:112) EVENT> xend.domain.exit
['unreal02', '3', 'poweroff']
[2005-06-23 19:54:09 xend] INFO (XendRoot:112) EVENT> xend.domain.destroy
['unreal02', '3']
With ktrace/kdump I can see it doing a bunch of dom0_op hypercalls (but
of course I can't follow the pointer to see the details), and some
of them fail with ESRCH. So... it almost looks like xend is getting
confused as to which domains are actually up. (Which, incidentally, is
how I ran into the xend-restart-panic bug: I tried to restart xend to
see if that would clear things up.)
Now, if I try to restart that domain, I get this:
Name Id Mem(MB) CPU State Time(s) Console
Domain-0 0 127 0 r---- 35.7
Domain-13 13 64 0 --p-- 0.0
unreal02 13 63 0 -b--- 3.6 9613
And the new copy of the host works fine (that is, it isn't blatantly
broken), although xend continues to attempt to destroy domain 3 (not 13)
every few minutes, and of course the domain list is a little screwed up.
So, if anyone who knows more about the innards of this stuff can suggest
where to look next, at least to see if it's NetBSD or xend that might be
responsible for this, that would be nice. (Though one would think that,
if it were OS-independent, someone would have noticed and fixed it.)
--
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l)))))) (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k))))))) '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))
Home |
Main Index |
Thread Index |
Old Index