Manuel Bouyer wrote: > On Fri, Jun 09, 2006 at 07:24:23AM -0700, Jeff Rizzo wrote: > > > Please note (in case you didn't) that the magic string is +++++ on a > Xen kernel, and not break (the serial console is managed by xen, so we > can't see the break). > Doh! I knew it was probably something like that; I should have looked harder. :} I managed to get a backtrace from the dom0 kernel this time. > > Could you try running in UP mode (I think it's 'nosmp' on the xen command > line, or something like that) and see if it helps ? Next thing to try > is to run a SMP Xen, but with both domains forced on cpu 0. > I may try that, if I can set up a situation where I can force the crash at will (since I don't really want to wait 24h each time I tweak something if I can help it). Since it seems to happen during the daily job consistently, I will see if running them from the commandline will trigger the hang. > Also, you could try using 'q' after ^A^A^A, to see the state of > domains, and other usefull infos (the NetBSD dom0 kernel should print > a few things too, it can be an indication on how hard it's hung) > > Below is the backtrace from ddb, and the output from the Xen kernel. (I don't know anything about the Xen output - I assume the apparently-interlaced-with-other-stuff output is due to both dom0 and Xen outputting. Stopped at netbsd:cpu_Debugger+0x4: leave db> bt cpu_Debugger(6dcc1c80,c09e2000,c09e2150,1,10) at netbsd:cpu_Debugger+0x4 xencons_tty_input(c0a6dc00,c055a930,1,10,7) at netbsd:xencons_tty_input+0xa9 xencons_intr(c0a6dc00,c062ab1c,0,c0aec100,0) at netbsd:xencons_intr+0x47 evtchn_do_event(4,c062ab1c,0,ab24,0) at netbsd:evtchn_do_event+0x9f do_hypervisor_callback(c062ab1c,0,3b9a0011,31,11) at netbsd:do_hypervisor_callba ck+0xad hypervisor_callback(c0574c80,0,0,c02f472d,c0575000) at netbsd:hypervisor_callbac k+0x64 cpu_switch(c0575000,0,cbcd7000,c02a99fe,c054fbc0) at netbsd:cpu_switch+0xd7 ltsleep(c0574c80,4,c04d464f,0,0) at netbsd:ltsleep+0x427 uvm_scheduler(c0573288,0,c0572b18,c04b80d4,c037351c) at netbsd:uvm_scheduler+0xa a main(c0100177,c010017f,0,0,0) at netbsd:main+0x4f1 db> (XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0). (XEN) 'q' pressed -> dumping task queues (now=0x547C:6E183AC1) (XEN) Xen: DOM 0, CPU 0 [has=T] flags=106d refcnt=2 nr_pages=49135 xenheap_pages=2 (XEN) Shared_info@00be6000: caf=80000003, taf=f0000003 (XEN) Guest: upcall_pend = 00, upcall_mask = 00 (XEN) Notifying guest... MdXeEbNu) Xegn :eve nDOt 3i_i, CPlUe v1el 0[hxac sci_i=peT]n difngla 0x83g0s= 10ci_0fide prtehf c1nt= kr_paegevstchn_u=pc6al5l_pen536ding xe n0 heevatpc_hpan_upcgaell_msa=2s (X1 EevN)t chSn_hpared_einfo@0n0dinbdgd0_se00l: caf= 080x0 evtchn00_m003, atsakf =f00ff00ff90503 b(2X ffENf) Gufffefsft f:f uffpfcffaf fllfffff_pffe ndf f= 0f0,f ufpfcff allffff_mafskfff = ffff 0f0fff ff f(XEfffN)f f Noftifyifng fguffefsfft .fff.. fffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff evtchn_pending 1410 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 and for good measure (not sure if it's useful here), here's the dump of the run queues: (XEN) Scheduler: Borrowed Virtual Time (bvt) (XEN) BVT: mcu=0x000186A0ns ctx_allow=0x004C4B40ns NOW=0x000054FDC442960F (XEN) CPU[00] svt=0x3D1C6C6C QUEUE rq fcffd120 n: fcffc084, p: fcffc278 (XEN) 0: 32767 has=F mcua=10 ev=0xFFFFFFFF av=0xFFFFFFFF c=0x4E2DEE8EE4F9 (XEN) l: fcffc084 n: fcffc278 p: fcffd120 (XEN) 1: 0 has=T mcua=10 ev=0x3D1C6C6C av=0x3D1C6C6C c=0x6CFDD13C31C (XEN) l: fcffc278 n: fcffd120 p: fcffc084 (XEN) CPU[01] svt=0x88F35DCC QUEUE rq fcffd140 n: fcffc214, p: fcffc214 (XEN) 0: 32767 has=T mcua=10 ev=0xFFFFFFFF av=0xFFFFFFFF c=0x3FE7740AFDE1 (XEN) l: fcffc214 n: fcffd140 p: fcffd140 Unfortunately, I never set up a dump device on this machine, so I can't get a crash dump. (Would that even help?) thanks, +j
Attachment:
signature.asc
Description: OpenPGP digital signature