Subject: how many kern.maxvnodes it too many/not enough?
To: None <tech-kern@netbsd.org>
From: Stephen Jones <smj@cirr.com>
List: tech-kern
Date: 01/15/2004 23:23:38
I've hired a relatively expensive NetBSD developer to help trouble
shoot a consistent vnlock deadlock (or an extremely long lock, or pile
of locks that never get unlocked) that we've been seeing for months on
our heavily used NFS clients.
On my own, I'm trying to learn and understand more about vnode locking
and why an NFS client would run into this problem. I've been able to
verify this problem with another high volume site, but it doesn't seem
to be an issue for others. Originally I assumed this was vnode
deadlocking, but that is probably wrong. What is probably happening is
that lots of vnodes are being used and eventually a pile of them causes
a temporary deadlock. In some cases, a client can recover on its own
if traffic slows down, but in most cases waiting is not an option and
the client must be rebooted
to regain access.
I've tried setting kern.maxvnodes to various sizes and I was wondering
what others would find reasonable for a number and how that number
should be determined. I've played with values between 16k and 64k ..
In the typical vnode lock we see everyday on at least one or two of our
clients (it plays no favourites), a ps from the debugger might look
like this:
PID PPID PGRP UID S FLAGS COMMAND
WAIT
13062 4982 2320 62035 3 0x4086 sleep
nanosle
13059 13046 13059 18375 3 0x4006 mail
vnlock
13055 12888 12879 92 3 0x4084 sleep
nanosle
13046 13008 13046 18375 3 0x4086 tcsh
pause
13008 590 13008 0 3 0x184 sshd
select
12888 12883 12879 92 3 0x4084 ksh
pause
12883 12879 12879 92 3 0x4084 ksh
pause
12879 12875 12879 0 3 0x4084 sh
wait
12875 637 637 0 3 0x84 cron
piperd
12819 1 12819 35449 3 0x4004 ksh
vnlock
12716 12715 631 32767 3 0x4004 finger
vnlock
12715 631 631 32767 3 0x4084 fingerd
piperd
12633 12632 631 32767 3 0x4004 finger
vnlock
12632 631 631 32767 3 0x4084 fingerd
piperd
11332 596 596 32767 3 0x184 httpd
netio
11327 596 596 32767 3 0x184 httpd
netcon
11319 596 596 32767 3 0x184 httpd
netcon
11138 9509 11138 8904 3 0x4006 ksh
vnlock
11016 1 11000 39284 3 0x4006 ksh
vnlock
10918 1 10900 47900 3 0x4006 ksh
vnlock
10767 1 10751 46262 3 0x4006 ksh
vnlock
9513 596 596 32767 3 0x184 httpd
netio
9509 590 9509 0 3 0x184 sshd
select
7351 1 7335 39284 3 0x4006 ksh
vnlock
5773 1 5757 39284 3 0x4006 ksh
vnlock
5379 1 5361 39284 3 0x4006 ksh
vnlock
4992 4991 2320 62035 3 0x4006 sh
vnlock
4991 2322 2320 62035 3 0x4086 getchar
wait
4982 2322 2320 62035 3 0x86 ksh
pause
4931 1 4911 6326 3 0x4006 ksh
vnlock
3221 1 3221 59077 3 0x4004 ksh
vnlock
3140 1 3140 59077 3 0x4006 ksh
vnlock
2936 1 2917 6326 3 0x4006 ksh
vnlock
2659 1 2659 59077 3 0x4006 ksh
vnlock
2254 1 2238 40035 3 0x4006 ksh
vnlock
29854 1 29854 34321 3 0x4006 ksh
vnlock
29681 1 29681 35449 3 0x4006 ksh
vnlock
28351 1 28351 35449 3 0x4006 ksh
vnlock
28122 1 28122 35449 3 0x4006 ksh
vnlock
25044 24821 24821 53559 3 0x4006 ksh
vnlock
24821 24714 24821 53559 3 0x400f mutt
genput
24714 1 24714 53559 3 0x4006 ksh93
vnlock
22242 596 596 32767 3 0x184 httpd
netcon
12560 1 12560 35449 3 0x4006 bash
vnlock
11733 11621 11733 52788 3 0x4086 bash
ttyin
11621 590 11621 0 3 0x184 sshd
select
28957 20518 28957 3428 3 0x5086 emacs
select
20518 20490 20518 3428 3 0x4086 bash
wait
20490 590 20490 0 3 0x184 sshd
select
18102 1 18102 8308 3 0x4006 pine
vnlock
2635 1 2635 54984 3 0x4006 bash
vnlock
26638 596 596 32767 3 0x184 httpd
netcon
26637 596 596 32767 3 0x184 httpd
netio
26636 596 596 32767 3 0x184 httpd
netio
26635 596 596 32767 3 0x184 httpd
netio
26634 596 596 32767 3 0x184 httpd
netcon
26633 596 596 32767 3 0x184 httpd
netcon
26632 596 596 32767 3 0x184 httpd
netcon
26631 596 596 32767 3 0x184 httpd
netcon
26630 596 596 32767 3 0x184 httpd
netio
26629 596 596 32767 3 0x184 httpd
netcon
26628 596 596 32767 3 0x184 httpd
netcon
26627 596 596 32767 3 0x184 httpd
netcon
26626 596 596 32767 3 0x184 httpd
netio
26625 596 596 32767 3 0x184 httpd
netcon
26624 596 596 32767 3 0x184 httpd
netcon
26623 596 596 32767 3 0x184 httpd
netcon
26622 596 596 32767 3 0x184 httpd
netio
26621 596 596 32767 3 0x184 httpd
netio
26620 596 596 32767 3 0x184 httpd
netio
26619 596 596 32767 3 0x184 httpd
netcon
26618 596 596 32767 3 0x184 httpd
netcon
26617 596 596 32767 3 0x184 httpd
netcon
26616 596 596 32767 3 0x184 httpd
netcon
26615 596 596 32767 3 0x184 httpd
netcon
26614 596 596 32767 3 0x184 httpd
netcon
26613 596 596 32767 3 0x184 httpd
netcon
26612 596 596 32767 3 0x184 httpd
netcon
26611 596 596 32767 3 0x184 httpd
netcon
26610 596 596 32767 3 0x184 httpd
netio
26609 596 596 32767 3 0x184 httpd
netcon
25603 25549 25603 49211 3 0x5006 emacs
vnlock
25593 1 25592 49211 3 0x86 twait
nanosle
25549 25489 25549 49211 3 0x4086 zsh
pause
25489 590 25489 0 3 0x184 sshd
select
11343 1 11343 50983 3 0x400f mutt
vnlock
19243 12348 19243 35058 3 0x400f mutt
vnlock
14238 14234 14238 64910 3 0x4086 ksh
ttyin
14234 590 14234 0 3 0x184 sshd
select
2322 2321 2320 62035 3 0x4086 ksh
piperd
2321 2320 2320 62035 3 0x4086 sh
wait
2320 2146 2320 62035 3 0x4186 com
wait
2146 2134 2146 62035 3 0x4086 tcsh
pause
2134 590 2134 0 3 0x184 sshd
select
12348 12326 12348 35058 3 0x4086 ksh
pause
12326 590 12326 0 3 0x184 sshd
select
641 1 641 100 3 0x4106 login
vnlock
637 1 637 0 3 0x84 cron
nanosle
631 1 631 0 3 0x84 inetd
select
613 1 613 0 3 0x84 timed
select
596 1 596 0 3 0x84 httpd
select
592 1 592 0 3 0x184 sshd
select
590 1 590 0 3 0x184 sshd
select
152 1 152 0 3 0x84 rpc.lockd
select
136 1 136 0 3 0x84 xfs
select
111 1 111 0 3 0x84 ypbind
select
106 1 106 0 3 0x84 rpcbind
select
93 1 93 0 3 0x84 syslogd
select
59 0 0 0 3 0x20284 nfsio
nfsidl
58 0 0 0 3 0x20284 nfsio
nfsidl
57 0 0 0 3 0x20284 nfsio
nfsidl
56 0 0 0 3 0x20284 nfsio
nfsidl
55 0 0 0 3 0x20284 nfsio
nfsidl
54 0 0 0 3 0x20284 nfsio
nfsidl
53 0 0 0 3 0x20284 nfsio
nfsidl
52 0 0 0 3 0x20284 nfsio
nfsidl
51 0 0 0 3 0x20284 nfsio
nfsidl
50 0 0 0 3 0x20284 nfsio
nfsidl
49 0 0 0 3 0x20284 nfsio
nfsidl
48 0 0 0 3 0x20284 nfsio
nfsidl
47 0 0 0 3 0x20284 nfsio
nfsidl
46 0 0 0 3 0x20284 nfsio
nfsidl
45 0 0 0 3 0x20284 nfsio
nfsidl
44 0 0 0 3 0x20284 nfsio
nfsidl
43 0 0 0 3 0x20284 nfsio
nfsidl
42 0 0 0 3 0x20284 nfsio
nfsidl
41 0 0 0 3 0x20284 nfsio
nfsidl
40 0 0 0 3 0x20284 nfsio
nfsidl
5 0 0 0 3 0x20204 aiodoned
aiodone
4 0 0 0 3 0x20204 ioflush
syncer
3 0 0 0 3 0x20204 reaper
reaper
2 0 0 0 3 0x20204 pagedaemon
pgdaemo
1 0 1 0 3 0x4084 init
wait
0 -1 0 0 3 0x20204 swapper
schedul