from time to time, nowadays at least once a week my dom0 panics. Which unfortunately also terminates all domU’s running on that machine. And that is not the desired state of things for me. [ 236025.4127414] panic: kernel diagnostic assertion "xs->resid == xs->datalen" failed: file "/hurz/src/sys/dev/pci/mpii.c", line 3207 [ 236025.4127414] cpu0: Begin traceback... [ 236025.4227374] vpanic() at netbsd:vpanic+0x177 [ 236025.4227374] kern_assert() at netbsd:kern_assert+0x4b [ 236025.4227374] mpii_scsi_cmd_done() at netbsd:mpii_scsi_cmd_done+0x30b [ 236025.4227374] mpii_intr() at netbsd:mpii_intr+0x21e [ 236025.4227374] evtchn_do_event() at netbsd:evtchn_do_event+0x114 [ 236025.4227374] do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x167 [ 236025.4327364] Xhandle_hypervisor_callback() at netbsd:Xhandle_hypervisor_callback+0x19 [ 236025.4327364] --- interrupt --- [ 236025.4327364] hypercall_page() at netbsd:hypercall_page+0x3aa [ 236025.4327364] idle_loop() at netbsd:idle_loop+0x146 [ 236025.4327364] cpu0: End traceback... [ 236025.4327364] dumping to dev 168,9 (offset=33482590, size=0): not possible [ 236025.4327364] rebooting... (XEN) Hardware Dom0 shutdown: rebooting machine I do have another machine with the same controller running -current rock-solid, but directly ont he hardware, no hypervisor involved, but running and booting from zfs. The machine i want to fix is running NetBSD 9.99.97 (XEN3_DOM0) #4: Thu Jun 16 13:02:43 CEST 2022 built from sources on that same date. This happened before with older -currents, so i suspect this is not a -current problem, but something with either xen oder zfs. the dom0 us running off a disk on the ciss controller - hardware raid - , but all the domUs have their virtual disks on files on a zfs filesystems. I also happen to run builds for the virtual machines in the dom0, and that is on a separate zfs filesystem. Is there any way to find out if the crash is caused by a domU or happens in the dom0? Should i use zvols instead of files? should I not use zfs at all? is there a better IT-mode cotroller for the HPE DL380g8? would it be fine to use the virtual disks of the ciss0 controller as zfs pool members? Or should i switch to something else for dom0? ciss0 at pci5 dev 0 function 0: HP Smart Array 12 ciss0: interrupting at msix5 vec 0 ciss0: 3 LDs, HW rev 1, FW 8.00/8.00, 64bit fifo rro, method perf 0x20000005 scsibus1 at ciss0: 3 targets, 1 lun per target ciss0: normal state on 'ciss0:0' (online) ciss0: normal state on 'ciss0:1' (online) ciss0: normal state on 'ciss0:2' (online) mpii0 at pci1 dev 0 function 0: Symbios Logic SAS2308 (rev. 0x05) mpii0: interrupting at msix0 vec 0 mpii0: H220, firmware 15.10.5.0, MPI 2.0 Any help or pointer appreciated... Cheers Oskar
Attachment:
smime.p7s
Description: S/MIME cryptographic signature