NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: NetBSD 9.1 upgrade and file system crash - reboot fails
Hi Martin,
On 2020-10-30 14:51:28 +0000 Martin Husemann <martin%duskware.de@localhost>
wrote:
On Fri, Oct 30, 2020 at 03:41:55PM +0100, Riccardo Mottola wrote:
A lot of errors.... and the system is not bootable anymore! I get:
NetBSD MBR boot....
Non-System disk or disk error
This is very early MBR boot sector failure, it should not be related
to the fsck issue - but maybe your disk starts to act up?
could be... the boot part should not be affected by a
kernel/filesystem error, right? (except something very bad like
out-of-partition access or such).
The disk should be pretty new, but read below.
I would start checking fdisk output for the disk - is it still as
expected? Does it show a NetBSD partition with expected size?
Disk: /dev/wd0
NetBSD disklabel disk geometry:
cylinders: 155061, heads: 16, sectors/track: 63 (1008 sectors/cylinder)
total sectors: 156301488, bytes/sector: 512
BIOS disk geometry:
cylinders: 1022, heads: 240, sectors/track: 63 (15120 sectors/cylinder)
total sectors: 156301488
Partitions aligned to 15120 sector boundaries, offset 63
Partition table:
0: NetBSD (sysid 169)
start 64, size 156301424 (76319 MB, Cyls 0/1/2-10337/95/63), Active
1: <UNUSED>
2: <UNUSED>
3: <UNUSED>
Bootselector disabled.
First active partition: 0
Drive serial number: 0 (0x00000000)
disklabel:
4 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 151173728 64 4.2BSD 0 0 0 # (Cyl.
0*- 149973)
b: 5127696 151173792 swap # (Cyl. 149974
- 155060)
c: 156301424 64 unused 0 0 # (Cyl.
0*- 155060)
d: 156301488 0 unused 0 0 # (Cyl. 0
- 155060)
offset ad size of c matches with the partition table. Is that fine
enough?
Then compare the disklabel output, does it match?
If that is ok, install bootloader again.
I Installed anyway and got the machine booting again.. then did all
the chekcs. All important data is backed up, the only inconvenience is
the typical setup-reinstall, etc.
Also use atactl to check the smart status of the disk.
How reliable is that data?
I checked SMART status, it looks a little worrying:
SMART supported, SMART enabled
id value thresh crit collect reliability description
raw
1 58 34 yes online positive Raw read error rate
27218486
3 96 0 yes online positive Spin-up time 0
4 95 20 no online positive Start/stop count
6082
5 100 36 yes online positive Reallocated sector count 13
7 81 30 yes online positive Seek error rate
125626383
9 95 0 no online positive Power-on hours count
4752
10 100 34 yes online positive Spin retry count 0
12 98 20 no online positive Device power cycle count
2790
192 99 0 no online positive Power-off retract count
2791
193 18 0 no online positive Load cycle count
165436
194 37 0 no online positive Temperature
37 Lifetime min/max 0/11
195 58 0 no online positive Hardware ECC Recovered
27218486
197 100 0 no online positive Current pending sector 0
198 100 0 no offline positive Offline uncorrectable 0
199 200 0 no online positive Ultra DMA CRC error count 0
200 100 0 no offline positive Write error rate 0
202 100 0 no online positive Data address mark errors 0
13 reallocated sectors, if one of them is on the MBR, who knows? But
also the number of cycles and power-on is high, but reasonable. The
read & Seek look incredibily high. So I thought of writing this to a
file, checking the next day and then today again, just do see what
increases.
The day after:
SMART supported, SMART enabled
id value thresh crit collect reliability description
raw
1 59 34 yes online positive Raw read error rate
232650323
3 96 0 yes online positive Spin-up time 0
4 95 20 no online positive Start/stop count
6088
5 100 36 yes online positive Reallocated sector count 13
7 81 30 yes online positive Seek error rate
126691967
9 95 0 no online positive Power-on hours count
4762
10 100 34 yes online positive Spin retry count 0
12 98 20 no online positive Device power cycle count
2793
192 99 0 no online positive Power-off retract count
2794
193 17 0 no online positive Load cycle count
166041
194 29 0 no online positive Temperature
29 Lifetime min/max 0/11
195 59 0 no online positive Hardware ECC Recovered
232650323
197 100 0 no online positive Current pending sector 0
198 100 0 no offline positive Offline uncorrectable 0
199 200 0 no online positive Ultra DMA CRC error count 0
200 100 0 no offline positive Write error rate 0
202 100 0 no online positive Data address mark errors 0
Some stuff makes sense.. like +10 more hours, a couple of start/stop
conts more. Bug e.g. the number of hardware error recorvered is 10
times higher? The same for the raw read error wow...
Then this is the data for the third day (each time I did a power-off
reboot, so it is not continuous operation, I shut down the laptop at
night)
SMART supported, SMART enabled
id value thresh crit collect reliability description
raw
1 60 34 yes online positive Raw read error rate
73875073
3 96 0 yes online positive Spin-up time 0
4 95 20 no online positive Start/stop count
6088
5 100 36 yes online positive Reallocated sector count 13
7 81 30 yes online positive Seek error rate
127050561
9 95 0 no online positive Power-on hours count
4771
10 100 34 yes online positive Spin retry count 0
12 98 20 no online positive Device power cycle count
2793
192 99 0 no online positive Power-off retract count
2794
193 17 0 no online positive Load cycle count
166675
194 28 0 no online positive Temperature
28 Lifetime min/max 0/11
195 60 0 no online positive Hardware ECC Recovered
73875073
197 100 0 no online positive Current pending sector 0
198 100 0 no offline positive Offline uncorrectable 0
199 200 0 no online positive Ultra DMA CRC error count 0
200 100 0 no offline positive Write error rate 0
202 100 0 no online positive Data address mark errors 0
The number of read errors skyrocketed!
The number of reallocated sector remains the same and this is the
only... reassuring thing.
Riccardo
Home |
Main Index |
Thread Index |
Old Index