current-users: panic while building a raid-1 set one component at a time

Subject: panic while building a raid-1 set one component at a time
To: None <current-users@netbsd.org>
From: Jeff Rizzo <riz@boogers.sf.ca.us>
List: current-users
Date: 10/05/2003 12:12:07
I've done this before, but not for about a year, so I'm not sure
if I'm doing something wrong here, or what.  I'm working with a GENERIC
kernel circa September 28 on i386 (from the releng.netbsd.org snapshot
that day)

I've got two identical disks, and constructed half a raid-1 on one (I
needed the other to bootstrap from sysinst) as it says to do in the
raidctl man page; it seems to be working fine in degraded mode.

The two disks are wd1 and wd2;  wd2 is the working component of the raid
set; I'm trying to add wd1.  I copied the disklabel from wd2 onto wd1,
did a 'raidctl -a /dev/wd1a raid0', and then when I try to do the
'raidctl -F component0 raid0', it panics:

# raidctl -a /dev/wd1a raid0
Warning: truncating spare disk /dev/wd1a to 488396928 blocks
# Oct  5 10:26:49  /netbsd: Warning: truncating spare disk /dev/wd1a to 488396928 blocks
raidctl -F component0 raid0
RECON: initiating reconstruction on row 0 col 0 -> spare at row 0 col 2
raid0: Quiescence reached..
panic: malloc: out of space in kmem_map
Stopped in pid 399.1 (raid_recon) at    netbsd:cpu_Debugger+0x4:        leave
db> 

Now, I'm wondering about the "Warning: truncating spare disk" message;
I can't see anything different about the labels of wd1 and wd2, and I
didn't get that message when I built wd2.

One interesting point:  I can't seem to change the info on wd2c in the
disklabel;  it always returns to

 c:        15         0     unused      0     0        # (Cyl.      0 -      0*)

No matter how I edit it with "disklabel", though the edits always seem to
take.

Anyway, here's the entire sequence.  I hope there's some clue in here
somewhere...

# disklabel wd1
# /dev/rwd1d:
type: ESDI
disk: WDC WD2500JB-32F
label: fictitious
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 484521
total sectors: 488397168
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

4 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a: 488397105        63       RAID                     # (Cyl.      0*- 484520)
 c: 488397105        63     unused      0     0        # (Cyl.      0*- 484520)
 d: 488397168         0     unused      0     0        # (Cyl.      0 - 484520)
# disklabel wd2
# /dev/rwd2d:
type: ESDI
disk: WDC WD2500JB-32F
label: fictitious
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 484521
total sectors: 488397168
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

4 partitions:
#        size    offset     fstype [fsize bsize cpg/sgs]
 a: 488397105        63       RAID                     # (Cyl.      0*- 484520)
 c:        15         0     unused      0     0        # (Cyl.      0 -      0*)
 d: 488397168         0     unused      0     0        # (Cyl.      0 - 484520)
# raidctl -a /dev/wd1a raid0
Warning: truncating spare disk /dev/wd1a to 488396928 blocks
# Oct  5 11:07:15  /netbsd: Warning: truncating spare disk /dev/wd1a to 488396928 blocks
raidctl -s raid0
Components:
          component0: failed
           /dev/wd2a: optimal
Spares:
           /dev/wd1a: spare
component0 status is: failed.  Skipping label.
Component label for /dev/wd2a:
   Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 20031005, Mod Counter: 101
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 488396928
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0
/dev/wd1a status is: spare.  Skipping label.
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
# raidctl -F component0 raid0
RECON: initiating reconstruction on row 0 col 0 -> spare at row 0 col 2
raid0: Quiescence reached..
panic: malloc: out of space in kmem_map
Stopped in pid 398.1 (raid_recon) at    netbsd:cpu_Debugger+0x4:        leave
db> bt
cpu_Debugger(0,e8f000,c087c000,0,e8f000) at netbsd:cpu_Debugger+0x4
panic(c0695840,0,e8f000,0,3a38b1) at netbsd:panic+0x11d
malloc(e8e2c4,c06cad40,0,0,3a38b1) at netbsd:malloc+0x167
rf_MakeReconMap(c08d5000,80,0,1d1c5880,0) at netbsd:rf_MakeReconMap+0xc2
rf_MakeReconControl(c0974900,0,0,0,2) at netbsd:rf_MakeReconControl+0x171
rf_ContinueReconstructFailedDisk(c0974900,0,2,0,c20ac4e0) at netbsd:rf_ContinueR
econstructFailedDisk+0xc1
rf_ReconstructFailedDiskBasic(c08d5000,0,0,c08d5000,c088fe60) at netbsd:rf_Recon
structFailedDiskBasic+0xb9
rf_ReconstructFailedDisk(c08d5000,0,0,1,c0100d22) at netbsd:rf_ReconstructFailed
Disk+0x60
rf_FailDisk(c08d5000,0,0,1,c42bd1b8) at netbsd:rf_FailDisk+0xc7
rf_ReconThread(c0924ec0,7e0000,7e9000,0,c010030c) at netbsd:rf_ReconThread+0x43
db> 
db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
>398              0        0          0 2 0x20200    1       raid_recon
 351            332      351          0 2  0x4002    1          raidctl
 349              1        1          0 2  0x4000    1            getty nanosle
 333              1        1          0 2  0x4000    1            getty nanosle
 343              1        1          0 2  0x4000    1            getty nanosle
 332              1      332          0 2  0x4003    1              csh   pause
 337              1      337          0 2       0    1             cron nanosle
 330              1      330          0 2       0    1            inetd  kqread
 281              1      281          0 2       0    1             sshd  select
 171              1      171          0 2       0    1          rpcbind  select
 150              1      150          0 2       0    1          syslogd
 120              1      120          0 2       0    1         dhclient  select
 8                0        0          0 2 0x20200    1         aiodoned aiodone
 7                0        0          0 2 0x20200    1          ioflush  syncer
 6                0        0          0 2 0x20200    1           reaper  reaper
 5                0        0          0 2 0x20200    1       pagedaemon pgdaemo
 4                0        0          0 2 0x20200    1       lfs_writer lfswrit
 3                0        0          0 2 0x20200    1          raidio0 raidiow
 2                0        0          0 2 0x20200    1            raid0 rfwcond
 1                0        1          0 2  0x4000    1             init    wait
 0               -1        0          0 2 0x20200    1          swapper
db> 

Thanks in advance for any clues anyone can provide...

+j