Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 03/12/2003 11:43:54
Brian Buhrow writes:
> Hello Greg. I tried a similar experiment here, and found the same
> result. Swapping to a raw disk partition, as opposed to a raid partition, wo
> rks
> fine, even if the rest of the filesystem i/o is going through the raid.
> this experience triggered a memory I had using raidframe under 1.5R.
> On that system, I could not run fsck -n on a mounted filesystem if the
> filesystem sat on a raid5 device because doing so would cause uvm to go
> into a tight loop claiming it was performing deadlock avoidance. When the
> machine died this evening while I was paging to the raid 5 device, I
> noticed that it died gradually, as if some resource had become unavailable
> and then, as more things began to need that resource, things coasted to a
> halt.
> In thinking about this some more, I'm pretty sure that the problem
> isn't alack of kernel memory, as we previously thought. Vmstat -m
> consistently showed only 3MB of kernel memory in use during the entire run
> up to death.
Using kgdb, when swapping to RAID 5, after the "hang" I see:
(gdb) xps
proc pid flag st wchan comm
0xcb25aad4 236 4006 3 0xc06bbc90 emacs (flt_noram5)
0xcb25a908 235 4086 3 0xc06a7748 emacs (select)
0xcb34a3a8 233 4006 3 0xc06bbc90 top (flt_noram1)
0xcb34a1dc 225 4082 3 0xcb34b624 csh (pause)
0xcb34a010 224 4084 3 0xc06a7748 telnetd (select)
0xcb25a3a4 213 4006 3 0xc06bbc90 csh (uao_getpage)
0xcb16d900 212 4084 3 0xc06a7748 telnetd (select)
0xcb16d734 206 4082 3 0xca92c008 getty (ttyin)
0xcb25a73c 204 80 3 0xc0681a60 cron (nanosleep)
0xcb25a570 201 80 3 0xc06a7748 inetd (select)
0xcb25a1d8 153 84 3 0xc0a12600 nfsd (nfsd)
0xcb25a00c 152 84 3 0xc0a12800 nfsd (nfsd)
0xcb196c9c 151 84 3 0xc0a12e00 nfsd (nfsd)
0xcb16dc98 150 84 3 0xc0a12000 nfsd (nfsd)
0xcb196ad0 148 80 3 0xc06a7748 nfsd (select)
0xcb1961d4 140 80 3 0xc06a7748 mountd (select)
0xcb196904 124 20284 3 0xc069cb4c nfsio (nfsidl)
0xcb196738 123 20284 3 0xc069cb48 nfsio (nfsidl)
0xcb19656c 122 20284 3 0xc069cb44 nfsio (nfsidl)
0xcb1963a0 121 20284 3 0xc069cb40 nfsio (nfsidl)
0xcb196008 112 80 3 0xc06a7748 rpcbind (select)
0xcb16dacc 101 4 3 0xc077a13c syslogd (anonget2)
---Type <return> to continue, or q <return> to quit---
0xcb16d568 12 20204 3 0xc06bbe38 aiodoned (aiodoned)
0xcb16d39c 11 20204 3 0xc06bbc90 ioflush (km_getwait2)
0xcb16d1d0 10 20204 3 0xc06a6bb0 reaper (reaper)
0xcb16d004 9 20204 3 0xc06bbe2c pagedaemon (pgdaemon)
0xcb154c94 8 20204 3 0xc06bbc90 raid (km_getwait2)
0xcb154ac8 7 20204 3 0xc0988160 raid (rfwcond)
0xcb1548fc 6 20204 3 0xc0968eac scsibus2 (sccomp)
0xcb154730 5 20204 3 0xc095d4ac scsibus1 (sccomp)
0xcb154564 4 20204 3 0xc09618ac scsibus0 (sccomp)
0xcb154398 3 20204 3 0xc06cc5e8 usbtask (usbtsk)
0xcb1541cc 2 20204 3 0xc0960238 usb0 (usbevt)
0xcb154000 1 4080 3 0xcb154000 init (wait)
0xc06a4e60 0 20204 3 0xc06a4e60 swapper (scheduler)
proc pid flag st wchan comm
(gdb)
which indicates lots of stuff is waiting on getting more kernel memory..
If you want to see "who was using what", grab:
http://www.cs.usask.ca/staff/oster/swapdebug.notes
and then cross-reference the lines with /usr/include/sys/malloc.h to see
who was using what... (I had these last night, but still havn't had time
to go through them carefully... A quick look showed the RAID stuff not
using more than 400K. I also havn't had time to look at what the flt_noram*
waiting is all about... I need to see how things behaving in -current as
well, given that many of the allocations are now poolified..)
> I'm thinking that perhaps there is a problem with the way
> physio() interacts with the locks you use to lock the indivividual
> components of the raid5 set. There's definitely a lot of calling around
> inside the kernel while locks are set, and I wonder if the i/o pattern
> raid5 uses tends to excercise the locking algorithm more heavily than do
> the raid0 and raid1 i/o patterns. Or, alternatively, if the raid5 pattern
> causes uvm to lock itself up.
No... RAID 5's themselves work just fine. It's swapping to them that is
broken. (I've put RAID 5 sets through intense file-system tests, and they do
just fine...)
> Note that I'm merely speculating based on relatively little knowledge
> about how the internals of raidframe and uvm work. I'm merely trying to
> make some, hopefully, reasonable suggestions about where to look.
> Good luck with the testing. If there's anything I can help with to
> further the process of squashing this little annoyance, let me know. I
> love raidframe, and I'd love to see it work completely as one would expect.
If'n'when I have an "easy" patch, I'll get you to test it :) (I suspect there
won't be an "easy" patch though :-/ )
> On Mar 11, 10:57pm, Greg Oster wrote:
> } Subject: Re: Possible serious bug in NetBSD-1.6.1_RC2
> } Brian Buhrow writes:
> } > 2. I'm notcertain, but my guess is that the reason raid5 works under 1.5
> R
> } > and not 1.6 is not so much due to the changes in raidframe itself, but,
> } > rather, changes in the way kernel memory is managed by uvm. They may be
> } > only tuning changes, but something definitely changed.
> }
> } So whatever the problem is, I can reliably lockup my test box in seconds
> } when swapping to a RAID 5 set... Using the same test I couldn't kill my
> } test box when swapping to a RAID 1 set.
> }
> } Workaround to the problem: Swap to a RAID 1 set instead of a RAID 5 set.
> } While reads might not be as fast, writes will be (likely) 4x faster, which
> } should improve your swapping/paging performance too!
> }
> } Figuring out what the problem is will require some additional testing,
> } and that will have to wait until at least tomorrow... :-}
> }
> } Thanks to everyone who helped narrow down the problem...
> }
> } Later...
> }
> } Greg Oster
> }
> }
> }
> }
> >-- End of excerpt from Greg Oster
>
Later...
Greg Oster