NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
zfs resilver in(de)finite loop?
I started to see r/w errors from one of my SAS drives after
a routine zpool(8) scrub (dmesg is littered with ACK/NAK
timeout errors). Since I have a pair of spares I thought
I'd replace the drive before investigating further (the
HDD, controller, cables, and backplate are all old and
suspect).
I wasn't sure whether in such a case one should take the
problematic drive offline and resilver, or a simple replace
would do, but assumed zfs would be smart enough to do the
Right Thing as it knows about the errors. So I issued
# zpool replace pond wedges/slot4zfs wedges/slot7zfs
many hours ago. Since then, as I periodically check
zpool(8) status it appears that the various counters and
timers keep starting over, while the error rates keep
increasing. Most recently:
# zpool status
pool: pond
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Aug 14 21:02:49 2021
118G scanned out of 1.59T at 230M/s, 1h52m to go
19.6G resilvered, 7.23% done
config:
NAME STATE READ WRITE CKSUM
pond ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wedges/slot0zfs ONLINE 0 0 0
wedges/slot1zfs ONLINE 0 0 0
wedges/slot2zfs ONLINE 0 0 0
wedges/slot3zfs ONLINE 0 0 0
replacing-4 ONLINE 0 0 945
wedges/slot4zfs ONLINE 299 5.07K 0 (resilvering)
wedges/slot7zfs ONLINE 0 0 0 (resilvering)
wedges/slot5zfs ONLINE 0 0 0
errors: No known data errors
which seems to show the process started most recently at
21:02 but this has been going on since midday. The only
difference I have noticed is that initially only the new
device was being reslivered but now both the old and the
new appear to be.
I'm new to ZFS and this is the first time I'm dealing with
disk errors. So I don't know if this is normal behaviour
and I should just wait or if I was wrong to issue replace
rather than take the drive offline and resilver from the
rest. If this is not normal, (how) can I recover?
Many thanks,
Pouya
Home |
Main Index |
Thread Index |
Old Index