Subject: kern/5233: NFS panic on reboot after network has gone south
To: None <gnats-bugs@gnats.netbsd.org>
From: None <cgd@NetBSD.ORG>
List: netbsd-bugs
Date: 03/30/1998 09:21:21
>Number: 5233
>Category: kern
>Synopsis: on reboot, system panic in NFS code when network was hung
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Mar 30 09:35:00 1998
>Last-Modified:
>Originator: Chris G. Demetriou
>Organization:
Kernel Hackers 'r' Us
>Release: NetBSD 1.3.1
>Environment:
NetBSD/i386 1.3.1 kernel (built from 1.3 sources + 1.3.1 patch), with
1.3 user-land. PPro 200, 64MB of RAM.
>Description:
[ I searched for other PRs with 'NFS' in the text; this PR
may be akin to 4115 or 2893, but it has a different panic,
different trace, and is for a different version of the system.
It does seem to be describing the same problem as 3072, but
the difference is 1.2B vs. 1.3.1. 8-]
Rebooted one of my PCs after the ethernet card driver got wedged
(interrupt-related lossage, card wasn't getting its interrupts,
but the reason for the network lossage itself seems irrelevant).
Several processes were hung, trying to do NFS operations to
another system on the network (obviously, given that the ethernet
driver was hung, those operations couldn't complete).
On reboot, after the disks were synced, the system crashed (ten
finger copy of the traceback):
syncing disks... 4 4 2 done
vm_fault (0xf0967a00, 0, 1, 0) -> 1
kernel: page fault trap, code=0
Stopped at _nfs_reply+0x9e: movl 0x8(%edx),%ecx
_nfs_reply(f095bac0,200,f093cd00,f48b3c2c,f090ca00) at _nfs_reply+0x9e
_nfs_request(f08c4400,f093cd00,1,f093e000,f092cf80) at _nfs_request+0x3ad
_nfs_getattr(f48b3cb8) at _nfs_getattr+0x336
_nfs_lookup(f48b3d68,f08c4400,f48b3f04,f48b3ee0,f48b3f04) at _nfs_lookup+0x1f5
_lookup(f48b3ee0) at _lookup+0x26e
_namei(f48b3ee0) at _namei+0x176
_vn_open(f48b3ee0,5,0,f01ee3d4,f093e00) at _vn_open+0x170
_sys_open(f48b3ee0,f48b3f88,f48b3f80,0,0) at _sys_open+0xaa
_syscall() at _syscall+0x238
--- syscall (number 5) ---
Other possibly-useful info:
_curproc = f093e000
which is:
PID proc addr uid ppid pgrp flag stat em comm
21825 0xf093e000 0xf48b2000 0 1 21825 004006 2 netbsd vi
other processes (runnable except where noted):
reboot
csh
df
umount
vi
tip (blocked on ttyout; IE+, wouldn't die; interrupt lossage
re: the serial port interrupt conflicting with the
enet card's interrupt)
csh (ppwait)
pagedaemon (paged)
init (wait)
swapper (scheduler)
tip (zombie)
>How-To-Repeat:
hard-NFS-mount some nfs file systems, tweak your network so
you can no longer talk to the server, and try to reboot?
>Fix:
Unknown.
>Audit-Trail:
>Unformatted: