So with regard to the performance issue, i.e. the peculiar difference between two different vintage Dell PE servers.... I think I can finally say with some confidence that the problem is entirely in FreeBSD and how it deals with whatever CPU, etc. that it is handed. So far this performance difference seems to be completely unrelated to FreeBSD's xbd(4) devices and to the other far more critical I/O corruption issue with xbd(4) devices and filesystems. What I really need help with now is for someone to help figure out the latter issue -- that's the one totally preventing me from restoring some production and test FreeBSD systems on my upgraded Xen servers. If you've got a Xen system running a NetBSD dom0, could you please try booting a recent FreeBSD ISO and/or memstick.img (you can try either type='hvm', or type='pvh' with an extracted kernel -- I use the latter because it's much simpler, but both exhibit the same I/O problems), and if that works then try making a filesystem on a scratch volume that can be assigned to the FreeBSD domU as a raw Xen VBD (virtual block device), and then fsck that new filesystem (before even mounting it). Note that unless you're using ZFS or LVM or real disk partitions this can be tricky. It is not safe to export more than one file-backed device to a domU at a time because of PR#53385. The problems I'm seeing are with LVM volumes and with file-backed disks, and to use a file-backed disk I've copied the mini-memstick.img file to an LVM volume. I don't have any spare real partitions on any of these systems. Hmmm... maybe I can attach a USB drive to my local server.... Note also that the ISO image I'm using is one that I extracted from the original ISO image and then re-packed with mkisofs(8) (from sysutils/cdrtools). Without doing this the original ISO reports weird GPT and I/O errors (though it will boot and run the installer): xbd0: 345MB <Virtual Block Device> at device/vbd/768 on xenbusb_front0 xbd0: attaching as ada0 xbd0: features: flush xbd0: synchronize cache commands enabled. GEOM: new disk ada0 xn0: backend features: feature-sg arc4random: no preloaded entropy cache Trying to mount root from cd9660:iso9660/12_2_RELEASE_AMD64_BO [ro]... GEOM: ada0: corrupt or invalid GPT detected. GEOM: ada0: GPT rejected -- may not be recoverable. GEOM: iso9660/12_2_RELEASE_AMD64_BO: corrupt or invalid GPT detected. GEOM: iso9660/12_2_RELEASE_AMD64_BO: GPT rejected -- may not be recoverable. cd9660: RockRidge Extension g_vfs_done():iso9660/12_2_RELEASE_AMD64_BO[READ(offset=362156032, length=2048)]error = 5 RRIP without PX field? g_vfs_done():iso9660/12_2_RELEASE_AMD64_BO[READ(offset=362156032, length=2048)]error = 5 g_vfs_done():iso9660/12_2_RELEASE_AMD64_BO[READ(offset=362156032, length=2048)]error = 5 xen_et0: providing initial system time start_init: trying /sbin/init g_vfs_done():iso9660/12_2_RELEASE_AMD64_BO[READ(offset=362156032, length=2048)]error = 5 The FreeBSD mini-memstick.img file won't mount at all when mounted as a file-backed disk -- it can't seem to decipher the label: xbd0: 386MB <Virtual Block Device> at device/vbd/768 on xenbusb_front0 xbd0: attaching as ada0 xbd0: features: flush xbd0: synchronize cache commands enabled. GEOM: new disk ada0 xn0: backend features: feature-sg Trying to mount root from ufs:/dev/ada0 [ro]... GEOM_PART: partition 2 has end offset beyond last LBA: 791120 > 790527 GEOM_PART: integrity check failed (ada0, MBR) Mounting from ufs:/dev/ada0 failed with error 2; retrying for 3 more seconds arc4random: no preloaded entropy cache arc4random: no preloaded entropy cache Mounting from ufs:/dev/ada0 failed with error 2; retrying for 2 more seconds Mounting from ufs:/dev/ada0 failed with error 2; retrying for 1 more second Mounting from ufs:/dev/ada0 failed with error 2. Loader variables: vfs.root.mountfrom=ufs:/dev/ada0 vfs.root.mountfrom.options=ro Manual root filesystem specification: <fstype>:<device> [options] Mount <device> using filesystem <fstype> and with the specified (optional) option list. eg. ufs:/dev/da0s1a zfs:zroot/ROOT/default cd9660:/dev/cd0 ro (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /) ? List valid disk boot devices . Yield 1 second (for background tasks) <empty line> Abort manual input mountroot> ? List of GEOM managed disk devices: ada0 mountroot> However if I copy the image file to an LVM volume and export that then it works: xbd0: 30720MB <Virtual Block Device> at device/vbd/2048 on xenbusb_front0 xbd0: attaching as da0 xbd0: features: flush xbd0: synchronize cache commands enabled. GEOM: new disk da0 mountroot> ? List of GEOM managed disk devices: ufs/FreeBSD_Install ufsid/5f927268239ae98e da0s2a msdosfs/EFISYS da0s2 da0s1 da0 mountroot> ufs:/dev/da0s2a ro Trying to mount root from ufs:/dev/da0s2a []... xen_et0: providing initial system time start_init: trying /sbin/init Back to the performance issue, here are some numbers. I did each test four or five times on each server, and these examples are consistent and within close range of the averages. On my "fast" server (the newer one) once all files are in cache the act of copying them all to /dev/null is quite fast: # time -l sh -c 'find / -type f -print | xargs cat > /dev/null ' 2.92 real 0.10 user 2.43 sys 5156 maximum resident set size 9 average shared memory size 4 average unshared data size 261 average unshared stack size 1594 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 93 voluntary context switches 348 involuntary context switches However on the "slow" server even with all files in cache the task of copying them to /dev/null takes considerable CPU time and effort in both userland and kernel: # time -l sh -c 'find / -type f -print | xargs cat > /dev/null ' 9.95 real 0.99 user 10.95 sys 5184 maximum resident set size 13 average shared memory size 4 average unshared data size 227 average unshared stack size 1709 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 182 voluntary context switches 398 involuntary context switches This is with the same ISO image mounted as root, and with the exact same xl.cfg(5) file, and with a shell started from the installer. Note too that if there are I/Os necessary to read the files (i.e. they are not all already in cache) then FreeBSD does correctly report the number of "block input operations" -- and of course with the slower server this takes even longer the first time around than on the fast server. Even after doing a bunch of these tests in each server, the fast one shows in CPU seconds that it can do all this with about 1/10'th the amount of effort that the slow one can do: <newer> # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 1497 1 r----- 58934.1 central 4 4000 1 -b---- 125219.4 b2 137 10000 16 r----- 943713.3 fbsd-test 147 1936 4 -b---- 119.9 <older> # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 4096 4 r----- 1848.0 fbsd-test 11 1936 4 -b---- 1268.4 On the slow server I've tried disabling as many hardware vulnerability mitigation features as I dare. Specifically I disable the Xen kernel features with: pv-l1tf=off,domu=off spec-ctrl=no-xen,l1d-flush=off # cat /boot.cfg menu=Boot Xen:load /netbsd-XEN3_DOM0 -v bootdev=dk0 console=xencons;multiboot /xen bootscrub=false dom0_mem=4G console=com1,vga console_timestamps=datems dom0_max_vcpus=4 dom0_vcpus_pin=true pv-l1tf=off,domu=off vpmu=on cpuid=rdrand spec-ctrl=no-xen,l1d-flush=off guest_loglvl=all menu=Boot Xen (previous kernels):load /netbsd-XEN3_DOM0- -v bootdev=dk0 console=xencons;multiboot /xen- bootscrub=false dom0_mem=4G console=com1,vga console_timestamps=datems dom0_max_vcpus=4 dom0_vcpus_pin=true pv-l1tf=off,domu=off l1d-flush=off vpmu=on menu=Boot GENERIC normally:rndseed /etc/entropy-file;boot /netbsd-GENERIC -v menu=Boot GENERIC previous kernel:rndseed /etc/entropy-file;boot /netbsd-GENERIC- -v menu=Boot GENERIC single user:rndseed /etc/entropy-file;boot /netbsd-GENERIC -vxs menu=Drop to boot prompt:prompt default=1 timeout=15 And I disable a FreeBSD feature with "vm.pmap.pti=0": # cat /etc/xen/fbsd-test-pvh.conf type = "pvh" name = "fbsd-test" memory = 2000 maxmem = 8000 vcpus = 4 vif = [ 'bridge=bridge0' ] kernel = "/build/images/freebsd-12.2-kernel" cmdline = 'vfs.root.mountfrom=cd9660:iso9660/remade_install,vfs.root.mountfrom.options=ro,boot_verbose=YES,vm.pmap.pti=0' disk = [ 'format=raw, vdev=sda, access=rw, target=/dev/mapper/scratch-fbsd--test.0', 'format=raw, vdev=hda, access=ro, target=/build/images/Repacked-FreeBSD-12.2-RELEASE-amd64-bootonly.iso', ] -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgp61yR7DRzUx.pgp
Description: OpenPGP Digital Signature