Subject: Re: HDD - SMART status
To: Tomasz Luchowski <tomasz@luchowski.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: current-users
Date: 04/26/2003 15:42:18
On Sat, Apr 26, 2003 at 01:54:16AM +0200, Tomasz Luchowski wrote:
> Hi,
>
> Does this mean the HDD is going to die soon? I had several problems with it
> already, but they would always go away after moving the disk physically
> by just some milimeters (some bizarre interaction.)
>
> I am getting the usual several second "freeze" after each read error.
> This is i386 running -current as of 13th April (no, I don't think I suffered
> from UFS2 problems).
>
> I knew it had to be finally replaced some day, but I'd like to make sure
> whether it's gotten really bad now.
>
> atactl wd0 smart status says:
>
> SMART supported, SMART enabled
> id value thresh crit collect reliability description
> 1 75 34 yes online positive Raw read error rate
> 3 86 0 yes online positive Spin-up time
> 4 100 20 no online positive Start/stop count
> 5 100 36 yes online positive Reallocated sector count
> 7 77 30 yes online positive Seek error rate
> 9 98 0 no online positive Power-on hours count
> 10 100 97 yes online positive Spin retry count
> 12 99 20 no online positive Device power cycle count
> 194 42 0 no online positive Temperature
> 195 75 0 no online positive
> 197 100 0 no online positive Current pending sector
> 198 100 0 no offline positive Offline uncorrectable
> 199 200 0 no online positive Ultra DMA CRC error count
> 200 100 0 no offline positive
> 202 100 0 no online positive
From this, it shouldn't be bad yet. It would be usefull to watch how
various values moves (how fast the "Raw read error rate" value decrease, for
example)
>
> in syslog:
>
> Apr 26 01:42:48 zunpc /netbsd: pciide0:0:0: lost interrupt
> Apr 26 01:43:00 zunpc /netbsd: type: ata tc_bcount: 16384 tc_skip: 0
> Apr 26 01:43:00 zunpc /netbsd: pciide0:0:0: bus-master DMA error: missing interr
> upt, status=0x21
> Apr 26 01:43:00 zunpc /netbsd: wd0e: DMA error reading fsbn 13069678 of 13069678
> -13069709 (wd0 bn 13069741; cn 12966 tn 0 sn 13), retrying
> Apr 26 01:43:00 zunpc /netbsd: wd0: soft error (corrected)
This looks more like a problem on the bus, rather than with the disk itself
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 24 ans d'experience feront toujours la difference
--