Matthias Petermann <mp%petermann-it.de@localhost> writes: > I run a NetBSD-based NAS at home. It is currently running on NetBSD 9.1. = Probably you should bring it forward along netbsd-9, but that's likely unrelated. > The system is booted from a USB stick on which the root file system is > also located. The storage is on 4 x 4 TB magnetic hard disks, configured > as ZFS RAIDZ2. > > Earlier I noticed that the I/O performance of the system suddenly > collapsed drastically. A look at the syslog gives a pretty clear > indication of the reason: > > [ 87240.313853] wd2: (uncorrectable data error) > [ 87240.313853] wd2d: error reading fsbn 5707914328 of 5707914328-5707914455 (wd2 bn 5707914328; cn 5662613 tn 6 sn 46) > [ 87465.637977] wd2d: error reading fsbn 5710464152 of 5710464152-5710464215 (wd2 bn 5710464152; cn 5665143 tn 0 sn 8), xfer 338, retry 0 > [ 87465.637977] wd2: (uncorrectable data error) > [ 87475.561683] wd2: soft error (corrected) xfer 338 > [ 87506.393194] wd2d: error reading fsbn 5710555128 of 5710555128-5710555255 (wd2 bn 5710555128; cn 5665233 tn 4 sn 12), xfer 40, retry 0 > [ 87506.393194] wd2: (uncorrectable data error) > [ 87515.156465] wd2d: error reading fsbn 5710555128 of 5710555128-5710555255 (wd2 bn 5710555128; cn 5665233 tn 4 sn 12), xfer 40, retry 1 You seem to be having both correctable and uncorrectable errors. > The whole syslog is full of these messages. What surprises me is that > there are "uncorrectable" data errors in the syslog. Nevertheless, the Why? These are the OS reading a block from wd2, and getting a notification from the controller that the block could not be read. This happens as disks become troubled, and I've seen it often over the years (over many systems; it's not often on any given system). > data can still be read - albeit very slowly. My assumption was that the You have to separate "can be read from wd2" and "can be read from the zfs raidz2". > redundancies of RAID2 are being used to compensate for the defects. To > my surprise, ZFS does not seem to have noticed any of these defects: I think you may have uncovered a bug in zfs statistics. > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > dk0 ONLINE 0 0 0 > dk1 ONLINE 0 0 0 > dk2 ONLINE 0 0 0 > dk3 ONLINE 0 0 0 It really seems like dk2 (assuming dk2 == wd2) should have some read errors. > Another indication that ZFS has not yet noticed the error: with top, > there is no significant CPU load during I/O, neither in the user nor > the system area. I would have expected this at least in the case when > ZFS works with redundancies. It's more or less xor for raidz1, so compared to disk read times, I'd expect no real cpu hit. I am unclear on raidz2 but surely it's not public key crypto. The corresponding operation is being done on every write to create the redundant bits. This may be slightly helpful, merely interesting, or neither: https://queue.acm.org/detail.cfm?id=1670144 > So it looks like the hardware error can still be corrected as far as > possible at the level of the device driver, which makes me doubt the > truth of the statement "uncorrectable data error". What I do is for each of my (physical) disks, spinning and ssd, is (x86 centric; c for others), once every few months dd if=/dev/rwd0d of=/dev/null bs=1m and see if that throws any errors. If there is one, I try to read that block a few times, and generally then will 1) take that as a sign to replace the disk (or move it to an nth external backup) and 2) write that sector, so that it gets reallocated. If the disk is part of raid1, I can write it with good data. If not, I write with zeros and fsck. I am a big fan of replacing disks that show errors, but sometimes one can't and that's my workaround. > Does anyone know what would have to happen for ZFS to notice the > hardware defect? I bet zfs got a read failed and did the reconstruction but didn't log it. But I'm guessing that that's a good thing to figure out. > saturn$ doas atactl wd2 smart status > SMART supported, SMART enabled > id value thresh crit collect reliability description raw > 1 197 51 yes online positive Raw read error rate 38669 > 3 176 21 yes online positive Spin-up time 6158 > 4 100 0 no online positive Start/stop count 510 > 5 200 140 yes online positive Reallocated sector count 0 > 7 200 0 no online positive Seek error rate 0 > 9 64 0 no online positive Power-on hours count 26740 > 10 100 0 no online positive Spin retry count 0 > 11 100 0 no online positive Calibration retry count 0 > 12 100 0 no online positive Device power cycle count 506 > 192 200 0 no online positive Power-off retract count 99 > 193 200 0 no online positive Load cycle count 2672 > 194 117 0 no online positive Temperature 33 > 196 200 0 no online positive Reallocated event count 0 > 197 200 0 no online positive Current pending sector 18 This is the big deal. The drive has decided that 18 sectors are not ok. It will reallocate them when written, but it is returned uncorrectable to avoid making that silent data loss for the OS. > 198 100 0 no offline positive Offline uncorrectable 0 > 199 200 0 no online positive Ultra DMA CRC error count 0 > 200 100 0 no offline positive Write error rate 0 Probably if you take that drive out and put it in a test box and write zeros to the whole drive and then read back it will be sort of ok, but I wouldn't trust it.
Attachment:
signature.asc
Description: PGP signature