At Fri, 16 Apr 2021 11:44:08 +0100, David Brownlee <abs%netbsd.org@localhost> wrote: Subject: Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0 > > On Fri, 16 Apr 2021 at 08:41, Greg A. Woods <woods%planix.ca@localhost> wrote: > > > What else is different? What am I missing? What could be different in > > NetBSD current that could cause a FreeBSD domU to (mis)behave this way? > > Could the fault still be in the FreeBSD drivers -- I don't see how as > > the same root problem caused corruption in both HVM and PVH domUs. > > Random data collection thoughts: > > - Can you reproduce it on tiny partitions (to speed up testing) > - If you newfs, shutdown the DOMU, then copy off the data from the > DOM0 does it pass FreeBSD fsck on a native boot > - Alternatively if you newfs an image on a native FreeBSD box and copy > to the DOM0 does the DOMU fsck fail > - Potentially based on results above - does it still happen with a > reboot between the newfs and fsck > - Can you ktrace whichever of newfs or fsck to see exactly what its > writing (tiny *tiny* filesystem for the win here :) So, the root filesystem is clean (from the factory, and verified by at least NetBSD's fsck as OK), but when '-f' is used it is found to be corrupt. Unfortunately I don't have any real FreeBSD machines available (though I could possibly get it installed on my MacBookPro again, but that's probably a multi-day effort at this point). However I've just found a way to reproduce the problem reliably and with a working comparison with a matching-sized memory disk. First off attach a tiny 4mb LVM LV to FreeBSD -- that's the smallest LV possible apparently: dom0 # lvm lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert build scratch -wi-a- 250.00g fbsd-test.0 scratch -wi-a- 30.00g fbsd-test.1 scratch -wi-a- 30.00g nbtest.pkg vg0 -wi-a- 30.00g nbtest.root vg0 -wi-a- 30.00g nbtest.swap vg0 -wi-a- 8.00g nbtest.var vg0 -wi-a- 10.00g tinytest vg0 -wi-a- 4.00m dom0 # xl block-attach fbsd-test format=raw, vdev=sdc, access=rw, target=/dev/mapper/vg0-tinytest Now a run of the test on the FreeBSD domU (first showing the kernel seeing the device attachment): # xbd3: 4MB <Virtual Block Device> at device/vbd/2080 on xenbusb_front0 xbd3: attaching as da2 xbd3: features: flush xbd3: synchronize cache commands enabled. GEOM: new disk da2 # dd if=/dev/zero of=tinytest.fs count=8192 8192+0 records in 8192+0 records out 4194304 bytes transferred in 0.081106 secs (51713998 bytes/sec) # mdconfig -a -t vnode -f tinytest.fs md0 # newfs -o space -n md0 /dev/md0: 4.0MB (8192 sectors) block size 32768, fragment size 4096 using 4 cylinder groups of 1.03MB, 33 blks, 256 inodes. super-block backups (for fsck_ffs -b #) at: 192, 2304, 4416, 6528 # newfs -o space -n da2 /dev/da2: 4.0MB (8192 sectors) block size 32768, fragment size 4096 using 4 cylinder groups of 1.03MB, 33 blks, 256 inodes. super-block backups (for fsck_ffs -b #) at: 192, 2304, 4416, 6528 # dumpfs da2 >da2.dumpfs # dumpfs md0 >md0.dumpfs # diff md0.dumpfs da2.dumpfs 1,2c1,2 < magic 19540119 (UFS2) time Fri Apr 16 18:48:55 2021 < superblock location 65536 id [ 6079dc17 1006b3b4 ] --- > magic 19540119 (UFS2) time Fri Apr 16 18:49:57 2021 > superblock location 65536 id [ 6079dc55 348e5947 ] 27c27 < magic 90255 tell 20000 time Fri Apr 16 18:48:55 2021 --- > magic 90255 tell 20000 time Fri Apr 16 18:49:57 2021 40c40 < magic 90255 tell 128000 time Fri Apr 16 18:48:55 2021 --- > magic 90255 tell 128000 time Fri Apr 16 18:49:57 2021 53c53 < magic 90255 tell 230000 time Fri Apr 16 18:48:55 2021 --- > magic 90255 tell 230000 time Fri Apr 16 18:49:57 2021 66c66 < magic 90255 tell 338000 time Fri Apr 16 18:48:55 2021 --- > magic 90255 tell 338000 time Fri Apr 16 18:49:57 2021 # fsck md0 ** /dev/md0 ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1 files, 1 used, 870 free (14 frags, 107 blocks, 1.6% fragmentation) ***** FILE SYSTEM IS CLEAN ***** # fsck da2 ** /dev/da2 ** Last Mounted on ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ROOT INODE UNALLOCATED ALLOCATE? [yn] n ***** FILE SYSTEM MARKED DIRTY ***** So I ktraced the fsck_ufs run, and though I haven't looked at it with a fine-tooth comb and the source open, the only thing that seems a wee bit different about what fsck does is that it opens the device twice, with O_RDONLY, then shortly before it prints the first "** /dev/da2" line it reopens it O_RDRW a third time, closes the second one, and then closes the second one and calls dup() on the third one so that it has the same FD# as the second open had. Otherwise it does a few reads of different sizes (all multiples of 512, none larger than 64kb), sometimes read()+lseek() and sometimes pread(), and some from each descriptor. Maybe that's the big difference -- it uses pread(2). It also appears to never explicitly close the third open, the one that was dup()ed to replace the second open, so I think the likes of valgrind would call that a leaked FD. :-) If I use "newfs -O1" then the symptoms change a bit, but most importantly the filesystem can be checked from the NetBSD dom0, and it checks cleanly, until FreeBSD fsck is run on the domU and marks it dirty, then NetBSD immediately sees the dirty flag, but no other damage. So I'm still not sure how this could be related to simply updating the dom0 NetBSD kernel (and Xen, but I've now gone through 4.11 and 4.13 and they both (mis)behave the same way) -- and this was a change which did not visibly affect any NetBSD domUs as they are happily serving and building alongside these tests. All of the above on its own would smell more like a FreeBSD bug somewhere in their Xen blkfront driver, but it would have to be pretty deep since the initial corruption I encountered was in a full HVM domU. All of this worked A-OK before, most recently I believe with an 8.99.32 kernel and Xen 4.8 (and definitely with 7.99 and 4.5 before that). -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgp4EQreC3RXB.pgp
Description: OpenPGP Digital Signature