Subject: kern/20325: processes stuck waiting on vnlock
To: None <gnats-bugs@gnats.netbsd.org>
From: Martin Husemann <martin@aprisoft.de>
List: netbsd-bugs
Date: 02/13/2003 09:07:15
>Number: 20325
>Category: kern
>Synopsis: processes stuck waiting on vnlock
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Feb 13 00:08:00 PST 2003
>Closed-Date:
>Last-Modified:
>Originator: Martin Husemann
>Release: NetBSD 1.6.1_RC1
>Organization:
>Environment:
System: NetBSD burgvogt.aprisoft.de 1.6.1_RC1 NetBSD 1.6.1_RC1 (VOGT) #0: Tue Feb 11 09:07:24 CET 2003 martin@beasty.aprisoft.de:/usr/src-1-6/sys/arch/sparc/compile/VOGT sparc
Architecture: sparc
Machine: sparc
>Description:
After upgrading my sparc/1.6 router machine to the latest version on
the 1.6 branch some time ago (NFS root on a i386 1.6.1_RC1 system), it
started "wedging" once or twice a week. It continues to route packets
for a while, but my ISP disconnects the line after 24 hours and we need
the ip-down/ip-up scripts to take care of routing changes (new IP), and
those seem to not run - so it finally loses completely.
I have DEBUG, DIAGNOSTIC and LOCKDEBUG in the kernel now.
Breaking into ddb works fine.
It seems sshd and getty are all stuck waiting on vnlock:
db> tr
zstty_stint(0xf02f2c68, 0x0, 0xf0116878, 0xf1958000, 0xf0196000, 0x104050a) at z
stty_stint+0x88
zsc_intr_hard(0x8, 0xf02efe80, 0xf0172c00, 0xfe000000, 0x8de, 0x100) at zsc_intr
_hard+0x68
zshard(0x0, 0xf010f6f0, 0xf00, 0x0, 0x1, 0xf0197df8) at zshard+0x40
sparc_interrupt44c(0x0, 0x0, 0xf0137bec, 0x0, 0xffffffff, 0x2) at sparc_interrup
t44c+0x170
mi_switch(0xf0195ee8, 0x3c85, 0xf01706a8, 0xf01981e4, 0x0, 0x70) at mi_switch+0x
210
ltsleep(0x0, 0x4, 0xf014f690, 0x0, 0x0, 0xf017d7bc) at ltsleep+0x24c
uvm_scheduler(0xf0195ee0, 0x1, 0xf0195c00, 0xf0142f50, 0xf0196000, 0xf01961e8) a
t uvm_scheduler+0x114
db> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
1515 173 1515 0 3 0x4 sshd vnlock
1514 1513 1513 0 3 0x100004 sh netio
1513 1511 1513 0 3 0x4084 sh wait
1512 1510 1512 0 3 0x100114 cron nfsrcvl
1511 197 197 0 3 0x84 cron piperd
1510 197 197 0 3 0x4 cron ppwait
346 1 346 0 3 0x4006 getty vnlock
197 1 197 0 3 0x4 cron nfsrcvl
189 1 189 0 3 0x84 ifwatchd netio
186 1 186 0 3 0x84 inetd pause
173 1 173 0 3 0x4 sshd vnlock
155 1 155 0 3 0x4 ntpd nfsrcvl
78 1 78 0 3 0x4 syslogd nfsrcvl
11 0 0 0 3 0x20204 aiodoned aiodone
10 0 0 0 3 0x20204 ioflush nfsrcvl
9 0 0 0 3 0x20204 reaper reaper
8 0 0 0 3 0x20204 pagedaemon pgdaemo
7 0 0 0 3 0x20284 nfsio nfsidl
6 0 0 0 3 0x20284 nfsio nfsidl
5 0 0 0 3 0x20284 nfsio nfsidl
4 0 0 0 3 0x20284 nfsio nfsidl
3 0 0 0 3 0x20204 scsibus1 sccomp
2 0 0 0 3 0x20204 scsibus0 sccomp
1 0 1 0 3 0x4084 init wait
0 -1 0 0 3 0x20204 swapper schedul
>How-To-Repeat:
Run 1.6.1_RC1 for a while? Not sure.
The NFS server (running a system compiled from the same sources) seems to be
fine, so maybe being NFS client is important here?
>Fix:
wish I had one...
>Release-Note:
>Audit-Trail:
>Unformatted: