Subject: NetBSD 3.0_BETA & RAIDframe problems
To: None <netbsd-help@netbsd.org>
From: Teemu Rinta-aho <teemu@rinta-aho.org>
List: netbsd-help
Date: 06/15/2005 14:28:19
Hi all,

I have a Pentium 4 machine which has worked fine with
a Hitachi Deskstar 120GB SATA disc for over a year,
running NetBSD-current, among other (operating) systems.

Now I started to build a RAID-1 set with the same machine,
planning to move the disks to my new server, which is
still on it's way from the Mini-ITX store.

I bought two 200GB Seagate Barracuda 7200.7 SATA discs
and connected them to the P4. I followed the instructions
in the NetBSD guide to install NetBSD 3.0_BETA into
the first disk, and then start configuring RAID-1 on
the second disk, transfer installed root file system
onto that, and boot from the second disk and let RAIDframe
copy stripes on the first one.

However, I ran into problems. Reconstruction of the
"failed" component doesn't seem to work. I get read
errors from either wd0 or wd1. Then I try to reconstruct,
reboot, and then the other disc complains the same
sector. I am quite sure that there is no bad sector
on both discs in the same position.

After the first boot from the raid0, I got a message
complaining that raid0e (/export) has an incorrect
super block. Then I just did newfs on that and the
problem was gone (the fs was empty). I don't know
what caused that.

I'd be greatful if I got any info whether the problem
is in RAIDframe code, my hardware or the NetBSD guide.

What I did differently was that I used FFSv2 and
soft dependencies on the file systems. Is that illegal
with RAIDframe?

You can find more information on my system below.

BR,
Teemu Rinta-aho


SYSTEM
======

NetBSD p4.rinta-aho.org 3.0_BETA NetBSD 3.0_BETA (GENERIC) #10: Fri Jun 10 06:15
:05 EEST 2005  root@backup.nomadiclab.com:/export/netbsd/netbsd-3.x/obj/sys/arch
/i386/compile/GENERIC i386


BOOT DMESG
==========

piixide1 at pci0 dev 31 function 2
piixide1: Intel 82801EB Serial ATA Controller (rev. 0x02)
piixide1: bus-master DMA support present
piixide1: primary channel configured to native-PCI mode
piixide1: using irq 5 for native-PCI interrupt
atabus2 at piixide1 channel 0
piixide1: secondary channel configured to native-PCI mode
atabus3 at piixide1 channel 1

wd0 at atabus2 drive 0: <ST3200822AS>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(piixide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)

wd1 at atabus3 drive 0: <ST3200822AS>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(piixide1:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA)

raid0: RAID Level 1
raid0: Components: /dev/wd0a /dev/wd1a[**FAILED**]
raid0: Total Sectors: 390721792 (190782 MB)
boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs
raid0: Error re-writing parity!


FSTAB
=====

/dev/raid0a / ffs rw,softdep 1 1
/dev/raid0b none swap sw 0 0
/dev/raid0e /export ffs rw,softdep 1 2
/dev/wd0b none swap dp 0 0
kernfs /kern kernfs rw
procfs /proc procfs rw,noauto


PARTITION TABLES
================

p4# fdisk wd0
Disk: /dev/rwd0d
NetBSD disklabel disk geometry:
cylinders: 387621, heads: 16, sectors/track: 63 (1008 sectors/cylinder)
total sectors: 390721968

BIOS disk geometry:
cylinders: 1023, heads: 81, sectors/track: 63 (5103 sectors/cylinder)
total sectors: 390721968

Partition table:
0: NetBSD (sysid 169)
    start 63, size 390721905 (190782 MB, Cyls 0-76567/9/1), Active
1: <UNUSED>
2: <UNUSED>
3: <UNUSED>
Bootselector disabled.

p4# fdisk wd1
Disk: /dev/rwd1d
NetBSD disklabel disk geometry:
cylinders: 387621, heads: 16, sectors/track: 63 (1008 sectors/cylinder)
total sectors: 390721968

BIOS disk geometry:
cylinders: 1023, heads: 81, sectors/track: 63 (5103 sectors/cylinder)
total sectors: 390721968

Partition table:
0: NetBSD (sysid 169)
    start 63, size 390721905 (190782 MB, Cyls 0-76567/9/1), Active
1: <UNUSED>
2: <UNUSED>
3: <UNUSED>
Bootselector disabled.


DISKLABELS
==========

p4# disklabel wd0
# /dev/rwd0d:
type: ESDI
disk: DISK1
...
5 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a: 390721905        63       RAID                     # (Cyl.      0*- 387620)
 b:   2097152  20971647       swap                     # (Cyl.  20805*-  22885*)
 c: 390721905        63     unused      0     0        # (Cyl.      0*- 387620)
 d: 390721968         0     unused      0     0        # (Cyl.      0 - 387620)
disklabel: partitions a and b overlap

p4# disklabel wd1
# /dev/rwd1d:
type: ESDI
disk: DISK2
...
5 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a: 390721905        63       RAID                     # (Cyl.      0*- 387620)
 b:   2097152  20971647       swap                     # (Cyl.  20805*-  22885*)
 c: 390721905        63     unused      0     0        # (Cyl.      0*- 387620)
 d: 390721968         0     unused      0     0        # (Cyl.      0 - 387620)
disklabel: partitions a and b overlap

p4# disklabel raid0
disklabel: Invalid signature in mbr record 0
# /dev/rraid0d:
type: RAID
disk: raid
...
5 partitions:#        size    offset     fstype [fsize bsize cpg/sgs]
 a:  20971520         0     4.2BSD   2048 16384 28088  # (Cyl.      0 -  20479)
 b:   2097152  20971520       swap                     # (Cyl.  20480 -  22527)
 d: 390721792         0     unused      0     0        # (Cyl.      0 - 381564*)
 e: 367653120  23068672     4.2BSD   2048 16384 29128  # (Cyl.  22528 - 381564*)


RAID CONF & BEHAVIOR
====================

p4# raidctl -a /dev/wd1a raid0
>> Warning: truncating spare disk /dev/wd1a to 390721792 blocks (from 390721841)
p4# raidctl -F /dev/wd1a raid0
>> RECON: initiating reconstruction on col 1 -> spare at col 2
p4# raidctl -s raid0
Components:
           /dev/wd0a: optimal
           /dev/wd1a: reconstructing
Spares:
           /dev/wd1a: used_spare
Component label for /dev/wd0a:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2005061401, Mod Counter: 52
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 390721792
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0
/dev/wd1a status is: reconstructing.  Skipping label.
Component label for /dev/wd1a:
   Row: 0, Column: 1, Num Rows: 0, Num Columns: 0
   Version: 0, Serial Number: 0, Mod Counter: 0
   Clean: No, Status: 0
   sectPerSU: 0, SUsPerPU: 0, SUsPerRU: 0
   Queue size: 0, blocksize: 0, numBlocks: 0
   RAID Level:
   Autoconfig:No
   Root partition: No
   Last configured as: raid0
Parity status: DIRTY
Reconstruction is 0% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

p4# tail dmesg
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15)wd0: (id not found)

raid0: Recon read failed!

p4# raidctl -S raid0
Reconstruction is 0% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
Reconstruction status:
  0% |                                       | ETA:    00:00 /

************************************
... after a few hours of waiting ...
************************************

p4# shutdown -r now

p4# tail dmesg
raid0: RAID Level 1
raid0: Components: /dev/wd0a /dev/wd1a
raid0: Total Sectors: 390721792 (190782 MB)
boot device: raid0
root on raid0a dumps on raid0b
ums0 at uhidev0: 5 buttons and Z dir.
wsmouse0 at ums0 mux 0
root file system type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
wd1a: error reading fsbn 268435392 of 268435392-268435519 (wd1 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd1: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd1a: error reading fsbn 268435392 of 268435392-268435519 (wd1 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd1: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd1a: error reading fsbn 268435392 of 268435392-268435519 (wd1 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd1: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd1a: error reading fsbn 268435392 of 268435392-268435519 (wd1 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd1: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd1a: error reading fsbn 268435392 of 268435392-268435519 (wd1 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd1: (id not found)
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15), retrying
wd0: (id not found)
wd1a: error reading fsbn 268435392 of 268435392-268435519 (wd1 bn 268435455; cn
266305 tn 0 sn 15)wd1: (id not found)

raid0: IO Error.  Marking /dev/wd1a as failed.
wd0a: error reading fsbn 268435392 of 268435392-268435519 (wd0 bn 268435455; cn
266305 tn 0 sn 15)wd0: (id not found)

Unable to verify raid1 parity: can't read stripe
Could not verify parity
raid0: Error re-writing parity!

p4# raidctl -s raid0
Components:
           /dev/wd0a: optimal
           /dev/wd1a: failed
No spares.
Component label for /dev/wd0a:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 2005061401, Mod Counter: 59
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 390721792
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0
/dev/wd1a status is: failed.  Skipping label.
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

-- 
teemu@rinta-aho.org -+- http://www.rinta-aho.org