With the gracious help of RVP <rvp%SDF.ORG@localhost> I have been able to identify better what is actually going wrong with FreeBSD's access to NetBSD dom0 xbdback(4) storage. It seems that in certain circumstances (e.g. in newfs and the test program) whenever FreeBSD issues a read of more than 1024 bytes only the first 1024 bytes are correct -- the rest of the bytes returned come from somewhere else on the disk, which appears to be starting at six(6) sectors after where they were supposed to have come from. Note that this corresponds to exactly 4096 bytes offset from the beginning of the read. I.e. it looks like reads over 1024 bytes in size result in something like the following: lseek(fd, offset, SEEK_SET); read(fd, &buf[0], 1024); lseek(fd, offset+4096, SEEK_SET); // or: lseek(fd, 4096-1024, SEEK_CUR); read(fd, &buf[1024], N-1024); Note that on FreeBSD hw.pagesize is 4096 and its blkfront device driver seems to use PAGE_SIZE as for its units of request. The mystery remains in why all is OK on NetBSD-8.99.32, but broken on NetBSD-9.99.81. So, how did I come to this observation? RVP supplied me with a little program to do random seeks and reads of sectors, create a transportable data file, and then to verify these reads again using the data file. I modified it to do random-sized reads (in sizes of small powers of two blocks), and then I ran it first on the dom0 from NetBSD to generate the initial data file, and then on a FreeBSD domU to check against the data file. I used a device which I had initialised with sectors of data where each byte in the sector is the sector's offset, modulo 256. The program also currently randomly chooses between lseek()+read() and pread() when doing the verification step, though I don't know if that is important, except see the note about dd(1) below. As a recap: - This test program effectively reproduces a failure similar to that of "newfs && fsck" on the FreeBSD domU, but shows us how the data looks when and where there are any problems. - This problem does not happen in NetBSD domUs (including with the same LVM LV backed device and the same blkchk program). - This problem does not happen if the dom0 is running NetBSD 8.99.32, (but does happen with 9.99.81). - quite surprisingly this problem does not seem to happen with dd(1): # dd if=/dev/da0 bs=512 skip=32 count=32 | od -c > /var/tmp/512 32+0 records in 32+0 records out 16384 bytes transferred in 0.046450 secs (352727 bytes/sec) # dd if=/dev/da0 bs=16384 skip=1 count=1 | od -c > /var/tmp/16384 1+0 records in 1+0 records out 16384 bytes transferred in 0.002174 secs (7536189 bytes/sec) # diff /var/tmp/512 /var/tmp/16384 # see also a matching example from blkchk below ktrace of FreeBSD dd shows it does lseek() and read()s. Note it is impossible to test every example from the random data with dd(1) due to the fact that it is impossible to make dd(1) skip in units other than its (input) block size. The awk program to initialize the device is here: https://github.com/robohack/experiments/blob/master/tblocks.awk The block checker is here: https://github.com/robohack/experiments/blob/master/blkchk.c So this program does something a little more easily observable than newfs does, but it's still not quite obvious what all the preconditions are, given that a one-off example tests with dd do not reproduce the same symptoms. An example of its output in the failure scenario on FreeBSD, where it is showing that for every read of a block larger than 1024, 1024th (and on) byte does not match what was read from the same offset on the NetBSD dom0: # ./blkchk check /dev/da0 ckfile.txt blkchk: read 65536 bytes @ 19355076608: mismatch: /dev/da0[+1024] \x8c != ckfile.txt[ln#1][1024] \x86 blkchk: pread 32768 bytes @ 10718762496: mismatch: /dev/da0[+1024] \xb3 != ckfile.txt[ln#2][1024] \xad blkchk: read 32768 bytes @ 5913347072: mismatch: /dev/da0[+1024] \x4a != ckfile.txt[ln#9][1024] \x44 blkchk: read 32768 bytes @ 25366386688: mismatch: /dev/da0[+1024] \x34 != ckfile.txt[ln#11][1024] \x2e blkchk: pread 32768 bytes @ 25685901824: mismatch: /dev/da0[+1024] \xe9 != ckfile.txt[ln#14][1024] \xe3 blkchk: read 8192 bytes @ 23109372416: mismatch: /dev/da0[+1024] \x8d != ckfile.txt[ln#15][1024] \x87 blkchk: read 32768 bytes @ 20043252224: mismatch: /dev/da0[+1024] \xe9 != ckfile.txt[ln#18][1024] \xe3 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1024] \x28 != /var/tmp/ckfile.bigones[ln#17][1024] \x22 Here's effectively the same test as last line, but done with dd(1) (it has an offset that is evenly divisible by the read size, making it suitable for dd) and we don't see any problem: # echo 28141568000 / 8192 | bc -l 3435250.00000000000000000000 # dd if=/dev/da0 bs=8192 iseek=3435250 count=1 2>/dev/null | od -h 0000000 2020 2020 2020 2020 2020 2020 2020 2020 * 0001000 2121 2121 2121 2121 2121 2121 2121 2121 * 0002000 2222 2222 2222 2222 2222 2222 2222 2222 * 0003000 2323 2323 2323 2323 2323 2323 2323 2323 * 0004000 2424 2424 2424 2424 2424 2424 2424 2424 * 0005000 2525 2525 2525 2525 2525 2525 2525 2525 * 0006000 2626 2626 2626 2626 2626 2626 2626 2626 * 0007000 2727 2727 2727 2727 2727 2727 2727 2727 * 0010000 2828 2828 2828 2828 2828 2828 2828 2828 * 0011000 2929 2929 2929 2929 2929 2929 2929 2929 * 0012000 2a2a 2a2a 2a2a 2a2a 2a2a 2a2a 2a2a 2a2a * 0013000 2b2b 2b2b 2b2b 2b2b 2b2b 2b2b 2b2b 2b2b * 0014000 2c2c 2c2c 2c2c 2c2c 2c2c 2c2c 2c2c 2c2c * 0015000 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d * 0016000 2e2e 2e2e 2e2e 2e2e 2e2e 2e2e 2e2e 2e2e * 0017000 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f 2f2f * 0020000 Let's try that again with just the one sample data line and blkchk: # grep 28141568000 /var/tmp/ckfile.txt > /var/tmp/ckfile.1 # /var/tmp/blkchk check /dev/da0 /var/tmp/ckfile.1 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1024] \x28 != /var/tmp/ckfile.1[ln#0][1024] \x22 # Every byte after 1024 is different, but I'll cut it off at 10: # /var/tmp/blkchk check -v /dev/da0 /var/tmp/ckfile.1 2>&1 | head blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1024] \x28 != /var/tmp/ckfile.1[ln#0][1024] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1025] \x28 != /var/tmp/ckfile.1[ln#0][1025] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1026] \x28 != /var/tmp/ckfile.1[ln#0][1026] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1027] \x28 != /var/tmp/ckfile.1[ln#0][1027] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1028] \x28 != /var/tmp/ckfile.1[ln#0][1028] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1029] \x28 != /var/tmp/ckfile.1[ln#0][1029] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1030] \x28 != /var/tmp/ckfile.1[ln#0][1030] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1031] \x28 != /var/tmp/ckfile.1[ln#0][1031] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1032] \x28 != /var/tmp/ckfile.1[ln#0][1032] \x22 blkchk: pread 8192 bytes @ 28141568000: mismatch: /dev/da0[+1033] \x28 != /var/tmp/ckfile.1[ln#0][1033] \x22 -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpQoE5QchjcE.pgp
Description: OpenPGP Digital Signature