Subject: kern/529: processes hanging in D state forever
To: None <gnats-admin@sun-lamp.cs.berkeley.edu>
From: None <danielce@ee.mu.oz.au>
List: netbsd-bugs
Date: 10/20/1994 01:50:06
>Number: 529
>Category: kern
>Synopsis: processes hanging in D state forever
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: gnats-admin (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Oct 20 01:50:04 1994
>Originator: daniel carosone
>Organization:
bozo software foundation
>Release:
>Environment:
System: NetBSD oink 1.0_BETA NetBSD 1.0_BETA (_oink3_) #16: Fri Sep 23 10:20:45 EST 1994 dan@oink:/home/c/l/NetBSD/src/sys/arch/sparc/compile/_oink3_ sparc
>Description:
I've mentioned this before, some time ago. The problem seemed to go
away, so I figured it had been fixed, but it's come back again
recently.
Processes that do a lot of filesystem activity, such as a find, or a
make of the world, will sometimes get stuck in the D+ state,
forever. They're obviously unkillable. It seems to only happen after
the machine has been up for some time, which may explain why I thought
the problem had gone away, since average uptime over the past months
has been less than usually.
The rest of the processes on the machine are fine.
Here's the current list of what looks to be stuck:
dan 19753 0.0 0.0 348 792 p0 D Tue08PM 0:00.90 make all
root 21067 0.0 0.0 108 496 ?? D Wed02AM 0:02.95 find /lfs/f -xdev ( ( -type f ( -perm -u+s -or -perm -g+s ) ) -or -
dan 23445 0.0 0.0 168 588 p0 D+ 8:02PM 0:03.82 find / -name pgp -print
root 23941 0.0 0.0 108 496 ?? D 2:02AM 0:01.28 find /lfs/f -xdev ( ( -type f ( -perm -u+s -or -perm -g+s ) ) -or -
dan 12065 0.0 0.0 284 740 p1 D+ 5:48PM 0:00.40 make
dan 12366 0.0 0.0 236 708 p1 D+ 5:51PM 0:00.33 make all
dan 12379 0.0 0.0 640 1192 p3 Ds+ 6:02PM 0:01.19 -tcsh (tcsh)
(/lfs/* are not LFS's, just a bad choice of name that I think I'll
change soon)
the makes are attempts to rebuild the world that got stuck (also in
/lfs/f in this case, though it hasn't always been).
Once it sets in, it seems to get worse, witness this attempt to see if
there was a specific inode that was causing the trouble:
dan@oink [18:09][36]/lfs/f> find .
.
./lost+found
./a
./l
^C^C^C^C
^C^C
^\^\^\
sources were current when it was last built, the date in the kernel
above. It's been up 20 days.
After a reboot, if it follows past patterns, it will be fine again for
a while. I'm leaving it up in case there's some useful information I
can get out of it for you, though processes are hanging left and right
by now. Sorry, there's no DDB in the running kernel.
One other possibly-relevant piece of information - I'm running amd on
the machine now, serving /home. /lfs is where local file systems get
mounted, and /home/? is a link entry on the same host. However, I
wasn't running amd before when I saw these problems (4 months ago?)
nor any NFS mounts, there were some NFS exports though.
>How-To-Repeat:
I suppose this doesn't happen to anyone else. I don't do anything
special to provoke it, it just happens.
>Fix:
Cry for help! :)
>Audit-Trail:
>Unformatted: