Subject: kern/28971: NFS thread goes into readdir tight loop
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <eravin@panix.com>
List: netbsd-bugs
Date: 01/15/2005 02:27:00
>Number: 28971
>Category: kern
>Synopsis: NFS thread goes into readdir tight loop
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 15 02:27:00 +0000 2005
>Originator: Ed Ravin
>Release: NetBSD 2.0
>Organization:
PANIX
>Environment:
NetBSD netbsd2.0.client 2.0 NetBSD 2.0 (PANIX-STD) #2: Mon Jan 10 17:12:07 EST 2005 root@netbsd2.0.client:/devel/netbsd/2.0/src/sys/arch/i386/compile/PANIX-STD i386
>Description:
We received an alarm that our NFS server was handling more than 4000 NFS operations per second, usually a sign of a runaway program on a client box.
The client turned out to be a NetBSD 2.0 system. It was in a tight loop sending out the same READDIR request to the NFS server over and over again. See trace below. There was no process that seemed to account for the requests. I used lsof, fstat, ps, and top, and could not find any process with a file open to the NFS server. Four NFS threads were configured. One of them showed high CPU time with ps, the others were idle. I tried reducing the number of kernel NFS threads to zero using sysctl but the high-usage thread remained. The only fix was to reboot the box.
Sample of trace follows. More of trace is available upon request.
17:16:02.593074 IP (tos 0x0, ttl 64, id 25772, offset 0, flags [none], length: 156) netbsd2.0.client.691067607 > netapp-6.1.server.nfs: 128 readdir fh Unknown/81DA0E00CBCE1D0020000000000EDA81CBCE1D00B121000081DA0E00CBCE1D00 8192 bytes @ 0x1000 verf 0000000000000000
17:16:02.593196 IP (tos 0x0, ttl 64, id 15741, offset 0, flags [none], length: 160) netapp-6.1.server.nfs > netbsd2.0.client.691067607: reply ok 132 readdir POST: DIR 755 ids 0/0 sz 0x1000 nlink 13 rdev 0/0 fsid 0x21b1 fileid 0xeda81 a/m/ctime 1104963361.850010000 1101258830.589001000 1101258830.589001000 verf 0000000000000000
17:16:02.593303 IP (tos 0x0, ttl 64, id 25773, offset 0, flags [none], length: 156) netbsd2.0.client.691067608 > netapp-6.1.server.nfs: 128 readdir fh Unknown/81DA0E00CBCE1D0020000000000EDA81CBCE1D00B121000081DA0E00CBCE1D00 8192 bytes @ 0x1000 verf 0000000000000000
17:16:02.593421 IP (tos 0x0, ttl 64, id 15997, offset 0, flags [none], length: 160) netapp-6.1.server.nfs > netbsd2.0.client.691067608: reply ok 132 readdir POST: DIR 755 ids 0/0 sz 0x1000 nlink 13 rdev 0/0 fsid 0x21b1 fileid 0xeda81 a/m/ctime 1104963361.850011000 1101258830.589001000 1101258830.589001000 verf 0000000000000000
17:16:02.593526 IP (tos 0x0, ttl 64, id 25774, offset 0, flags [none], length: 156) netbsd2.0.client.691067609 > netapp-6.1.server.nfs: 128 readdir fh Unknown/81DA0E00CBCE1D0020000000000EDA81CBCE1D00B121000081DA0E00CBCE1D00 8192 bytes @ 0x1000 verf 0000000000000000
>How-To-Repeat:
We don't know how to repeat it. It happened to us twice so far, on two separate Netapp / NetBSD 2.0 client boxes, on the same day. No other occurences.
>Fix: