Subject: panic while building a raid-1 set one component at a time
To: None <current-users@netbsd.org>
From: Jeff Rizzo <riz@boogers.sf.ca.us>
List: current-users
Date: 10/05/2003 12:12:07
I've done this before, but not for about a year, so I'm not sure
if I'm doing something wrong here, or what. I'm working with a GENERIC
kernel circa September 28 on i386 (from the releng.netbsd.org snapshot
that day)
I've got two identical disks, and constructed half a raid-1 on one (I
needed the other to bootstrap from sysinst) as it says to do in the
raidctl man page; it seems to be working fine in degraded mode.
The two disks are wd1 and wd2; wd2 is the working component of the raid
set; I'm trying to add wd1. I copied the disklabel from wd2 onto wd1,
did a 'raidctl -a /dev/wd1a raid0', and then when I try to do the
'raidctl -F component0 raid0', it panics:
# raidctl -a /dev/wd1a raid0
Warning: truncating spare disk /dev/wd1a to 488396928 blocks
# Oct 5 10:26:49 /netbsd: Warning: truncating spare disk /dev/wd1a to 488396928 blocks
raidctl -F component0 raid0
RECON: initiating reconstruction on row 0 col 0 -> spare at row 0 col 2
raid0: Quiescence reached..
panic: malloc: out of space in kmem_map
Stopped in pid 399.1 (raid_recon) at netbsd:cpu_Debugger+0x4: leave
db>
Now, I'm wondering about the "Warning: truncating spare disk" message;
I can't see anything different about the labels of wd1 and wd2, and I
didn't get that message when I built wd2.
One interesting point: I can't seem to change the info on wd2c in the
disklabel; it always returns to
c: 15 0 unused 0 0 # (Cyl. 0 - 0*)
No matter how I edit it with "disklabel", though the edits always seem to
take.
Anyway, here's the entire sequence. I hope there's some clue in here
somewhere...
# disklabel wd1
# /dev/rwd1d:
type: ESDI
disk: WDC WD2500JB-32F
label: fictitious
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 484521
total sectors: 488397168
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
4 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 488397105 63 RAID # (Cyl. 0*- 484520)
c: 488397105 63 unused 0 0 # (Cyl. 0*- 484520)
d: 488397168 0 unused 0 0 # (Cyl. 0 - 484520)
# disklabel wd2
# /dev/rwd2d:
type: ESDI
disk: WDC WD2500JB-32F
label: fictitious
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 484521
total sectors: 488397168
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
4 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 488397105 63 RAID # (Cyl. 0*- 484520)
c: 15 0 unused 0 0 # (Cyl. 0 - 0*)
d: 488397168 0 unused 0 0 # (Cyl. 0 - 484520)
# raidctl -a /dev/wd1a raid0
Warning: truncating spare disk /dev/wd1a to 488396928 blocks
# Oct 5 11:07:15 /netbsd: Warning: truncating spare disk /dev/wd1a to 488396928 blocks
raidctl -s raid0
Components:
component0: failed
/dev/wd2a: optimal
Spares:
/dev/wd1a: spare
component0 status is: failed. Skipping label.
Component label for /dev/wd2a:
Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 20031005, Mod Counter: 101
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 488396928
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
/dev/wd1a status is: spare. Skipping label.
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
# raidctl -F component0 raid0
RECON: initiating reconstruction on row 0 col 0 -> spare at row 0 col 2
raid0: Quiescence reached..
panic: malloc: out of space in kmem_map
Stopped in pid 398.1 (raid_recon) at netbsd:cpu_Debugger+0x4: leave
db> bt
cpu_Debugger(0,e8f000,c087c000,0,e8f000) at netbsd:cpu_Debugger+0x4
panic(c0695840,0,e8f000,0,3a38b1) at netbsd:panic+0x11d
malloc(e8e2c4,c06cad40,0,0,3a38b1) at netbsd:malloc+0x167
rf_MakeReconMap(c08d5000,80,0,1d1c5880,0) at netbsd:rf_MakeReconMap+0xc2
rf_MakeReconControl(c0974900,0,0,0,2) at netbsd:rf_MakeReconControl+0x171
rf_ContinueReconstructFailedDisk(c0974900,0,2,0,c20ac4e0) at netbsd:rf_ContinueR
econstructFailedDisk+0xc1
rf_ReconstructFailedDiskBasic(c08d5000,0,0,c08d5000,c088fe60) at netbsd:rf_Recon
structFailedDiskBasic+0xb9
rf_ReconstructFailedDisk(c08d5000,0,0,1,c0100d22) at netbsd:rf_ReconstructFailed
Disk+0x60
rf_FailDisk(c08d5000,0,0,1,c42bd1b8) at netbsd:rf_FailDisk+0xc7
rf_ReconThread(c0924ec0,7e0000,7e9000,0,c010030c) at netbsd:rf_ReconThread+0x43
db>
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
>398 0 0 0 2 0x20200 1 raid_recon
351 332 351 0 2 0x4002 1 raidctl
349 1 1 0 2 0x4000 1 getty nanosle
333 1 1 0 2 0x4000 1 getty nanosle
343 1 1 0 2 0x4000 1 getty nanosle
332 1 332 0 2 0x4003 1 csh pause
337 1 337 0 2 0 1 cron nanosle
330 1 330 0 2 0 1 inetd kqread
281 1 281 0 2 0 1 sshd select
171 1 171 0 2 0 1 rpcbind select
150 1 150 0 2 0 1 syslogd
120 1 120 0 2 0 1 dhclient select
8 0 0 0 2 0x20200 1 aiodoned aiodone
7 0 0 0 2 0x20200 1 ioflush syncer
6 0 0 0 2 0x20200 1 reaper reaper
5 0 0 0 2 0x20200 1 pagedaemon pgdaemo
4 0 0 0 2 0x20200 1 lfs_writer lfswrit
3 0 0 0 2 0x20200 1 raidio0 raidiow
2 0 0 0 2 0x20200 1 raid0 rfwcond
1 0 1 0 2 0x4000 1 init wait
0 -1 0 0 2 0x20200 1 swapper
db>
Thanks in advance for any clues anyone can provide...
+j