Subject: kern/2061: Uninteruptable processes.
To: None <gnats-bugs@NetBSD.ORG>
From: David Gilbert <dgilbert@jaywon.pci.on.ca>
List: netbsd-bugs
Date: 02/10/1996 19:57:44
>Number: 2061
>Category: kern
>Synopsis: processes get stuck in uninteruptible disk wait
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Feb 10 20:20:08 1996
>Last-Modified:
>Originator: David Gilbert
>Organization:
----------------------------------------------------------------------------
|David Gilbert, PCI, Richmond Hill, Ontario. | Two things can only be |
|Mail: dgilbert@jaywon.pci.on.ca | equal if and only if they |
|http://www.pci.on.ca/~dgilbert | are precisely opposite. |
---------------------------------------------------------GLO----------------
>Release: 1.1
>Environment:
System: NetBSD repeat 1.1A NetBSD 1.1A (REPEAT) #37: Fri Feb 9 23:31:14 EST 1996 root@:/u/dgilbert/src/sys/arch/sparc/compile/REPEAT sparc
>Description:
It is possible that you might actually change this to
port-sparc, but I thought that I'd give it a more general audience
first. It is also possible that this is related to the clock stopping
problem on the sparc --- but not exclusively so... I think it is
related in cause only (a race condition)
What happens is that I will get processes (usually but not
always very busy processes) that get stuck in 'non-interuptible disk
wait' (flag D in ps -ax). They will never leave this state and are
immune to a root kill -9.
Typical examples are various parts of cnews, but I have had
pppd get stuck there, too. I have also had other average 'sock'
NetBSD executables get stuck there (such as find).
>How-To-Repeat:
Run a full newsfeed :).
>Fix:
I'm thinking that this is some form of race condition. I
could be wrong.
>Audit-Trail:
>Unformatted: