Subject: Re: i386 1.4Q hangs nonrandomly?
To: Ethan Solomita <ethan@geocast.com>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: current-users
Date: 01/27/2000 12:46:18
I'm working on this problem for the last two weeks (PR kern/9197). Some buffers
are continously cycling through the b_actf queue. All have the same cylinder,
only the blknum varies. Because of a problem in the (old, pre-B_ORDERED)
sys/kern/subr_disk.c they go to the first part of the list and block the
remaining buffers. This leads to empty buffer freelists and a total lockup.

sys/kern/subr_disk::disksort

	/*
	 * If we lie after the first (currently active) request, then we
	 * must locate the second request list and add ourselves to it.
	 */
	bq = ap->b_actf;
	if (bp->b_cylinder < bq->b_cylinder) {

must read (to be exact):

	bq = ap->b_actf;
	if (bp->b_cylinder < bq->b_cylinder ||
	   (bp->b_cylinder == bq->b_cylinder && bp->b_blkno < bq->b_blkno)) {

or (more lazy, les comparisions):

	bq = ap->b_actf;
	if (bp->b_cylinder <= bq->b_cylinder) {

Is this a correct fix? Comments?

-- 
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)

> maximum entropy wrote:
> > 
> > The system happily chugged away for about 3 hours like this, then
> > locked up solid.
> > 
> > I'm totally out of ideas now...
> > 
> 	I haven't been following all of this conversation, but there's
> something I'm working on which is probably worth mentioning. There is a
> soft updates "livelock" under heavy use for which I'll be submitting a
> fix soon. The main symptom is that the disk light will stay on, since
> the disk is being continuously written to, yet no forward progress is
> being made and the livelock never ends.
> 
> 	I realize that isn't the explanation for all of this, but it seems like
> this discussion has encompassed more than one bug, and I expect that I'm
> not the only one who has suffered with the livelock bug.
> 	-- Ethan