Subject: Re: raidframe problems (revisited)
To: Greg Oster <oster@cs.usask.ca>
From: Louis Guillaume <lguillaume@berklee.edu>
List: netbsd-users
Date: 05/29/2007 08:24:50
Greg Oster wrote:
> With the array in degraded mode, can you mount /dev/wd1a (or
> equivalent) as a filesystem, and run a series of stress-tests on
> that, at the same time that you stress the RAID set? Something like:
>
> foreach i (`jot 1000`)
> cp src.tar.gz src.tar.gz.$i && rm -f src.tar.gz.$i &
> sleep 10
> dd if=/dev/zero of=bigfile.$i bs=10m count=100 && rm -f bigfile.$i &
> sleep 10
> dd if=src.tar.gz.$i of=/dev/null bs=10m &
> end
>
> that end up running on both wd0a and wd1a at the same time. In an
> ideal world, take RAIDframe out of the equation entirely, and push
> the disks, both reads and writes... (If you have an area reserved for
> swap on both, you could disable swap, and use that space). And then
> once the disks are "busy", do something like extract src.tar.gz to
> both wd0a and wd1a, and compare the bits as extracted and see if
> there are differences. (You'll need to tune things so you don't run
> out of space, of course)
This is a great idea and I'll add it to my list of tests to try and
reproduce the problem.
> I suspect it's a drive controller issue (or driver issue) that only
> manifests itself when you push both channels really hard...
>
Judging from your experience and what others have said about the
stability of raidframe I highly suspect the controller (or driver) too.
Especially since the RAID-1 set works fine with only one component! It's
not like the system doesn't have the right data in the buffers to write
out to disk. I don't believe the memory is the problem because it's been
replaced.
What hasn't been tested (by me) is maxing out the i/o on both channels
at the same time. So I will do this next...
Thanks!
Louis