Subject: Re: i386 1.4Q hangs nonrandomly?
To: Ethan Solomita <ethan@geocast.com>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: current-users
Date: 01/27/2000 12:46:18
I'm working on this problem for the last two weeks (PR kern/9197). Some buffers
are continously cycling through the b_actf queue. All have the same cylinder,
only the blknum varies. Because of a problem in the (old, pre-B_ORDERED)
sys/kern/subr_disk.c they go to the first part of the list and block the
remaining buffers. This leads to empty buffer freelists and a total lockup.
sys/kern/subr_disk::disksort
/*
* If we lie after the first (currently active) request, then we
* must locate the second request list and add ourselves to it.
*/
bq = ap->b_actf;
if (bp->b_cylinder < bq->b_cylinder) {
must read (to be exact):
bq = ap->b_actf;
if (bp->b_cylinder < bq->b_cylinder ||
(bp->b_cylinder == bq->b_cylinder && bp->b_blkno < bq->b_blkno)) {
or (more lazy, les comparisions):
bq = ap->b_actf;
if (bp->b_cylinder <= bq->b_cylinder) {
Is this a correct fix? Comments?
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)
> maximum entropy wrote:
> >
> > The system happily chugged away for about 3 hours like this, then
> > locked up solid.
> >
> > I'm totally out of ideas now...
> >
> I haven't been following all of this conversation, but there's
> something I'm working on which is probably worth mentioning. There is a
> soft updates "livelock" under heavy use for which I'll be submitting a
> fix soon. The main symptom is that the disk light will stay on, since
> the disk is being continuously written to, yet no forward progress is
> being made and the livelock never ends.
>
> I realize that isn't the explanation for all of this, but it seems like
> this discussion has encompassed more than one bug, and I expect that I'm
> not the only one who has suffered with the livelock bug.
> -- Ethan