bin/38471: Raidframe crashes on reconstruction of RAID5 (5 disks @ 298GB)

To: gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: bin/38471: Raidframe crashes on reconstruction of RAID5 (5 disks @ 298GB)
From: thomas%t-online.de@localhost
Date: Sun, 20 Apr 2008 19:20:00 +0000 (UTC)

>Number:         38471
>Category:       bin
>Synopsis:       Raidframe crashes on reconstruction of RAID5 (5 disks @ 298GB)
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 20 19:20:00 +0000 2008
>Originator:     Thomas Feddersen
>Release:        4.0_BETA2
>Organization:
Dipl.-Ing Thomas Feddersen, Beratender Ingenieur
>Environment:
NetBSD bremen 4.0_BETA2 NetBSD 4.0_BETA2 (XEN3_DOM0) #0: Thu Mar  1 04:57:05 
PST 2007  
builds@wb33:/home/builds/ab/netbsd-4/i386/200703010002Z-obj/home/builds/ab/netbsd-4/src/sys/arch/i386/compile/XEN3_DOM0
 i386
>Description:
One disk of the raid-set has sectors pending. 
Apr 20 18:09:26 bremen smartd[201]: Device: /dev/wd5d, 1 Currently unreadable 
(pending) sectors
Apr 20 18:09:26 bremen smartd[201]: Device: /dev/wd5d, 1 Offline uncorrectable 
sectors

I attempted an in-place reconstruction of the drive in question, but 
reconstruction lets the system drop into debugger.

Second attempt by adding a spare component and reconstructing to the spare has 
the same effect.

The problem occurs with XEN3 (128MB total memory) or i386 (1GB total Memory) 
kernel likewise.

Need assistance to restore the redundancy of the raid set.
>How-To-Repeat:
set up a raid5 set with 5 drives of 298 GB and fail one drive:

bremen# raidctl -s raid0
Components:
           /dev/wd1a: optimal
           /dev/wd2a: optimal
           /dev/wd3a: optimal
           /dev/wd4a: optimal
           /dev/wd5a: failed
No spares.
Component label for /dev/wd1a:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 5
   Version: 2, Serial Number: 2007020501, Mod Counter: 1543
   Clean: No, Status: 0
   sectPerSU: 16, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 625142320
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/wd2a:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 5
   Version: 2, Serial Number: 2007020501, Mod Counter: 1543
   Clean: No, Status: 0
   sectPerSU: 16, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 625142320
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/wd3a:
   Row: 0, Column: 2, Num Rows: 1, Num Columns: 5
   Version: 2, Serial Number: 2007020501, Mod Counter: 1543
   Clean: No, Status: 0
   sectPerSU: 16, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 625142320
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
Component label for /dev/wd4a:
   Row: 0, Column: 3, Num Rows: 1, Num Columns: 5
   Version: 2, Serial Number: 2007020501, Mod Counter: 1543
   Clean: No, Status: 0
   sectPerSU: 16, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 625142320
   RAID Level: 5
   Autoconfig: Yes
   Root partition: No
   Last configured as: raid0
/dev/wd5a status is: failed.  Skipping label.
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

Assign plenty of swapspace:

bremen# swapctl -lh
Device             Size     Used    Avail Capacity  Priority
/dev/wd0b          129M       0B     129M     0%    0
/home/fed/swapfile 4.9G       0B     4.9G     0%    0
Total              5.0G       0B     5.0G 

Give the command for in-place reconstruction:

bremen# raidctl -R /dev/wd5a raid0

the system will drop into debugger (green letters on console):

raid0: initiating in-place reconstruction on column 4
panic: malloc: out of space in kmem_map
stopped in pid 885.1 (raid_reconip) at  netbsd:cpu_Debugger+0x4:  popl
%
ebp
db>

the only way out is to reboot the system and operate the raid set in degraded 
mode
>Fix:
not known.

according to thread "NetBSD-users: Problem with raidframe under NetBSD-3 and 
NetBSD-4" the system must be taken out of operation.

Prev by Date: Re: kern/38469: ACPI 20080321 fails on Toshiba Portege 3110CT
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: PR/37227 CVS commit: src/sys/arch/arm/sa11x0
Next by Thread: Re: bin/38471: Raidframe crashes on reconstruction of RAID5 (5 disks @ 298GB)
Indexes:

Home | Main Index | Thread Index | Old Index