NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: raidframe panic
On Sun, 27 May 2012 07:13:37 -0400
Greg Troxel <gdt%ir.bbn.com@localhost> wrote:
>
> I have a machine with 2 x 400G SATA drives, which are too small, so I
> bought 2 * 2T SATA drives as replacements. I put one of the new ones
> in an Aluratek docking station (== external drive case electrically,
> with mechanicals for easy swapping), and then (netbsd-5, i386)
>
> created gpt label and RF partition
>
> created RAID1 set with this drive and a missing drive
>
> disklabeled the drive
>
> made filesystems, copied some data, etc.
>
> Then, I
>
> unmounted the filesystems
>
> didn't do anything about the raid set
>
> powered off the drive
>
> waited about 10s
>
> did something like 'raidctl -s raid1', or 'disklabel raid1', and
> got a crash
>
> It seems unplugging USB drives ought to be stable. I realize raid is
> tricky, because there's deconfiguring and there's failed. But drive
> going away from USB is pretty much failed, so this ought to be
> graceful. Am I confused, or have I found a bug?
I suspect you've found a bug...
> Separately, it seems like I should have done 'raidctl -u'.
That would have avoided this problem, yes....
> Also, it would be nice if
>
> unmounted filesystems caused the raid set to be put in a state
> similar to unconfigured relative to clean/dirty status (it probably
> does)
Yes, it does that.
> when a raid set's disks all go away, perhaps it should just vanish
> if it's autoconfigured, so plugging in two usb disks of a RAID1 set
> brings it back and it's just like a single disk.
RAIDframe isn't really designed to work this way...
>
> But I think if the result was that raid1 showed as having the missing
> disk as failed/missing and no panic, things would be much better.
>
>
>
> #0 0xc05e55c2 in cpu_reboot ()
> #1 0xc0516890 in panic ()
> #2 0xc05e8467 in trap ()
> #3 0xc010ccb7 in calltrap ()
> #4 0xc05e06a1 in db_read_bytes ()
> #5 0xc01dabf7 in db_get_value ()
> #6 0xc05e107d in db_stack_trace_print ()
> #7 0xc0516865 in panic ()
> #8 0xc05e8467 in trap ()
> #9 0xc010ccb7 in calltrap ()
> #10 0xc04acd8a in dkstrategy ()
Can you tell me what line in dkstrategy it's trapping on? My guess is
that bp->b_dev is no longer pointing to a valid device, but that should
have been caught in bdev_strategy()...
> #11 0xc050b289 in bdev_strategy ()
> #12 0xc0204ca9 in rf_DispatchKernelIO ()
> #13 0xc01fce91 in rf_DiskIOEnqueue ()
> #14 0xc01fb17f in rf_DiskReadFuncForThreads ()
> #15 0xc01ffdd9 in FireNode ()
> #16 0xc01ffeed in FireNodeList ()
> #17 0xc02003d0 in rf_FinishNode ()
> #18 0xc01fae3d in rf_NullNodeFunc ()
> #19 0xc01ffdd9 in FireNode ()
> #20 0xc0200065 in rf_DispatchDAG ()
> #21 0xc0214c27 in rf_State_ExecuteDAG ()
> #22 0xc02155aa in rf_ContinueRaidAccess ()
> #23 0xc01feffd in rf_DoAccess ()
> #24 0xc0204fbe in raidstart ()
> #25 0xc0200880 in rf_RaidIOThread ()
> #26 0xc01002e1 in lwp_trampoline ()
Later...
Greg Oster
Home |
Main Index |
Thread Index |
Old Index