Subject: vnode/nfs deadlocking - how much for this bug's head?
To: None <tech-kern@netbsd.org, tech-net@netbsd.org>
From: Stephen Jones <smj@cirr.com>
List: tech-kern
Date: 01/17/2004 17:51:25
After using David Brownlee's and Perry Metzger's suggestions for
kern.maxvnodes yesterday
I've experienced 5 'lockups' which I'm calling a vnode deadlock. I
also took the suggestion
to run 1.6.2rc4 (Jan 17th) on one of the clients, but it doesn't seem
to have made a difference.
A couple of things to note:
* most clients are running under 256 processes at any given time (out
of 8192)
* most clients have under 256 files open at any given time (out of 8192)
* kern.maxvnode is 250000 (25% of RAM) on the clients and they will
continue to allocate
vnodes until they reach that limit. (If a machine can run that
long!)
* swap is allocated on all clients, but is never used.
Another problem is that the client will continue to serve what requests
it gets until it eventually
locks up with the majority of its processes in 'vnlock' states which
means that if you get to
the machine 10 or so minutes after the first deadlock most processes
will report 'swapped out'
if you try to trace them.
In this particular instance, I logged in and ran 'df' .. then ^T and
found my process status was 'vnlock'
so I broke to the debugger and traced my process:
trace: pid 29792 at 0xfffffe0029953ad8
mi_switch() at mi_switch+0x1c0
ltsleep() at ltsleep+0x34c
lockmgr() at lockmgr+0xb38
genfs_lock() at genfs_lock+0x28
vn_lock() at vn_lock+0xc8
vget() at vget+0x14c
nfs_root() at nfs_root+0x3c
lookup() at lookup+0x6c4
namei() at namei+0x4a8
sys_statfs() at sys_statfs+0x60
syscall_plain() at syscall_plain+0x154
XentSys() at XentSys+0x58
PID PPID PGRP UID S FLAGS COMMAND
WAIT
29792 630 29792 100 3 0x4106 df
vnlock
29788 29785 29788 0 3 0x10 cron
vnlock
29787 29784 29787 0 3 0x10 cron
vnlock
29786 29783 29786 0 3 0x10 cron
vnlock
29785 625 625 0 3 0 cron
ppwait
29784 625 625 0 3 0 cron
ppwait
29783 625 625 0 3 0 cron
ppwait
29781 13179 29781 51494 3 0x4082 vi
ttyin
29779 620 620 0 3 0x4000 ftpd
vnlock
29778 29772 29778 0 3 0x10 cron
vnlock
29777 29771 29777 0 3 0x10 cron
vnlock
29776 29770 29776 0 3 0x10 cron
vnlock
29775 29769 29775 0 3 0x10 cron
vnlock
29774 29768 29774 0 3 0x10 cron
vnlock
29773 29767 29773 0 3 0x10 cron
vnlock
29772 625 625 0 3 0 cron
ppwait
29771 625 625 0 3 0 cron
ppwait
29770 625 625 0 3 0 cron
ppwait
29769 625 625 0 3 0 cron
ppwait
29768 625 625 0 3 0 cron
ppwait
29767 625 625 0 3 0 cron
ppwait
29765 29762 29765 0 3 0x10 cron
vnlock
29764 29761 29764 0 3 0x10 cron
vnlock
29763 29760 29763 0 3 0x10 cron
vnlock
29762 625 625 0 3 0 cron
ppwait
29761 625 625 0 3 0 cron
ppwait
29760 625 625 0 3 0 cron
ppwait
29759 620 620 0 3 0x4000 ftpd
vnlock
29755 620 620 0 3 0x4000 ftpd
vnlock
29754 29750 29754 0 3 0x10 cron
vnlock
29753 29749 29753 0 3 0x10 cron
vnlock
29752 29748 29752 0 3 0x10 cron
vnlock
29751 29747 29751 0 3 0x10 cron
vnlock
29750 625 625 0 3 0 cron
ppwait
29749 625 625 0 3 0 cron
ppwait
29748 625 625 0 3 0 cron
ppwait
29747 625 625 0 3 0 cron
ppwait
29746 620 620 0 3 0x4000 ftpd
vnlock
29744 29738 29744 0 3 0x10 cron
vnlock
29743 29737 29743 0 3 0x10 cron
vnlock
29742 29736 29742 0 3 0x10 cron
vnlock
29741 29735 29741 0 3 0x10 cron
vnlock
29740 29734 29740 0 3 0x10 cron
vnlock
29739 29733 29739 0 3 0x10 cron
vnlock
29738 625 625 0 3 0 cron
ppwait
29737 625 625 0 3 0 cron
ppwait
29736 625 625 0 3 0 cron
ppwait
29735 625 625 0 3 0 cron
ppwait
29734 625 625 0 3 0 cron
ppwait
29733 625 625 0 3 0 cron
ppwait
29732 29728 29732 0 3 0x10 cron
vnlock
29731 29727 29731 0 3 0x10 cron
vnlock
29730 29726 29730 0 3 0x10 cron
vnlock
29729 29725 29729 0 3 0x10 cron
vnlock
29728 625 625 0 3 0 cron
ppwait
29727 625 625 0 3 0 cron
ppwait
29726 625 625 0 3 0 cron
ppwait
29725 625 625 0 3 0 cron
ppwait
29722 25483 25481 4588 3 0x2 ksh
vnlock
29721 29718 29721 0 3 0x10 cron
vnlock
29720 29717 29720 0 3 0x10 cron
vnlock
29719 29716 29719 0 3 0x10 cron
vnlock
29718 625 625 0 3 0 cron
ppwait
29717 625 625 0 3 0 cron
ppwait
29716 625 625 0 3 0 cron
ppwait
29714 620 620 0 3 0x4000 telnetd
vnlock
29713 29707 29713 0 3 0x10 cron
vnlock
29712 29706 29712 0 3 0x10 cron
vnlock
29711 29705 29711 0 3 0x10 cron
vnlock
29710 29704 29710 0 3 0x10 cron
vnlock
29709 29703 29709 0 3 0x10 cron
vnlock
29708 29702 29708 0 3 0x10 cron
vnlock
29707 625 625 0 3 0 cron
ppwait
29706 625 625 0 3 0 cron
ppwait
29705 625 625 0 3 0 cron
ppwait
29704 625 625 0 3 0 cron
ppwait
29703 625 625 0 3 0 cron
ppwait
29702 625 625 0 3 0 cron
ppwait
29699 620 620 0 3 0x4000 ftpd
vnlock
29698 580 29698 0 3 0x100 sshd
vnlock
29695 29692 29695 0 3 0x10 cron
vnlock
29694 29691 29694 0 3 0x10 cron
vnlock
29693 29690 29693 0 3 0x10 cron
vnlock
29692 625 625 0 3 0 cron
ppwait
29691 625 625 0 3 0 cron
ppwait
29690 625 625 0 3 0 cron
ppwait
29688 29687 29687 35058 3 0x100000 mutt
vnlock
29687 26858 29687 35058 3 0x80 mutt
wait
29685 29679 29685 0 3 0x10 cron
vnlock
29684 29678 29684 0 3 0x10 cron
vnlock
29683 29677 29683 0 3 0x10 cron
vnlock
29682 29676 29682 0 3 0x10 cron
vnlock
29681 29675 29681 0 3 0x10 cron
vnlock
29680 29674 29680 0 3 0x10 cron
vnlock
29679 625 625 0 3 0 cron
ppwait
29678 625 625 0 3 0 cron
ppwait
29677 625 625 0 3 0 cron
ppwait
29676 625 625 0 3 0 cron
ppwait
29675 625 625 0 3 0 cron
ppwait
29674 625 625 0 3 0 cron
ppwait
29673 1 24648 55270 3 0x4002 head
vnlock
29671 620 620 0 3 0x4000 ftpd
vnlock
29670 29665 29670 0 3 0x10 cron
vnlock
29669 29664 29669 0 3 0x10 cron
vnlock
29668 29663 29668 0 3 0x10 cron
vnlock
29667 29662 29667 0 3 0x10 cron
vnlock
29666 29661 29666 0 3 0x10 cron
vnlock
29665 625 625 0 3 0 cron
ppwait
29664 625 625 0 3 0 cron
ppwait
29663 625 625 0 3 0 cron
ppwait
29662 625 625 0 3 0 cron
ppwait
29661 625 625 0 3 0 cron
ppwait
29660 580 29660 0 3 0x100 sshd
vnlock
29659 19839 19837 62035 3 0x4000 rm
vnlock
29656 6814 29656 8203 3 0x100002 bash
vnlock
29653 578 29653 0 3 0x100 sshd
vnlock
29652 29648 29652 0 3 0x10 cron
vnlock
29651 29647 29651 0 3 0x10 cron
vnlock
29650 29646 29650 0 3 0x10 cron
vnlock
29649 29645 29649 0 3 0x10 cron
vnlock
29648 625 625 0 3 0 cron
ppwait
29647 625 625 0 3 0 cron
ppwait
29646 625 625 0 3 0 cron
ppwait
29645 625 625 0 3 0 cron
ppwait
29643 578 29643 0 3 0x100 sshd
vnlock
29641 1 29641 61473 3 0x4002 quota
vnlock
29640 620 620 0 3 0x4000 telnetd
vnlock
29638 29635 29638 0 3 0x10 cron
vnlock
29637 29634 29637 0 3 0x10 cron
vnlock
29636 29633 29636 0 3 0x10 cron
vnlock
29635 625 625 0 3 0 cron
ppwait
29634 625 625 0 3 0 cron
ppwait
29633 625 625 0 3 0 cron
ppwait
29631 22483 585 32767 3 0x100 httpd
vnlock
29630 620 620 0 3 0x4000 ftpd
vnlock
29629 29549 585 32767 3 0x100100 httpd
vnlock
29628 578 29628 0 3 0x100 sshd
vnlock
29626 22485 585 32767 3 0x100 httpd
vnlock
29625 578 29625 0 3 0x100 sshd
vnlock
29624 29614 29624 0 3 0x10 cron
vnlock
29623 29618 29623 0 3 0x10 cron
vnlock
29622 29617 29622 0 3 0x10 cron
vnlock
29621 29616 29621 0 3 0x10 cron
vnlock
29620 29615 29620 0 3 0x10 cron
vnlock
29619 29613 29619 0 3 0x10 cron
vnlock
29618 625 625 0 3 0 cron
ppwait
29617 625 625 0 3 0 cron
ppwait
29616 625 625 0 3 0 cron
ppwait
29615 625 625 0 3 0 cron
ppwait
29614 625 625 0 3 0 cron
ppwait
29613 625 625 0 3 0 cron
ppwait
29612 18511 29612 53018 3 0x100012 tcsh
vnlock
29611 578 29611 0 3 0x100 sshd
vnlock
29609 29543 585 32767 3 0x100 httpd
vnlock
29608 29513 585 32767 3 0x100 httpd
vnlock
29607 29507 585 32767 3 0x100100 httpd
vnlock
29606 578 29606 0 3 0x100 sshd
vnlock
29605 29554 585 32767 3 0x100100 httpd
vnlock
29604 29497 585 32767 3 0x100 httpd
vnlock
29603 620 620 0 3 0x4000 ftpd
vnlock
29602 22497 585 32767 3 0x100100 httpd
vnlock
29601 578 29601 0 3 0x100 sshd
vnlock
29600 578 29600 0 3 0x100 sshd
vnlock
29599 578 29599 0 3 0x100 sshd
vnlock
29597 620 620 0 3 0x4000 ftpd
vnlock
29596 29592 29596 0 3 0x10 cron
vnlock
29595 29591 29595 0 3 0x10 cron
vnlock
29594 29590 29594 0 3 0x10 cron
vnlock
29593 29589 29593 0 3 0x10 cron
vnlock
29592 625 625 0 3 0 cron
ppwait
29591 625 625 0 3 0 cron
ppwait
29590 625 625 0 3 0 cron
ppwait
29589 625 625 0 3 0 cron
ppwait
29588 585 585 0 3 0x100 httpd
vnlock
29587 578 29587 0 3 0x100 sshd
vnlock
29586 29496 585 32767 3 0x100 httpd
vnlock
29585 1 620 0 3 0x100 ftpd
vnlock
29584 620 620 0 3 0 inetd
vnlock
29582 22481 585 32767 3 0x100 httpd
vnlock
29581 585 585 0 3 0x100 httpd
vnlock
29580 585 585 0 3 0x100 httpd
vnlock
29579 585 585 0 3 0x100 httpd
vnlock
29578 585 585 0 3 0x100 httpd
vnlock
29577 29545 585 32767 3 0x100100 httpd
vnlock
29574 29566 29566 0 3 0x4000 ksh
vnlock
29573 29568 29568 92 3 0x4000 ksh
vnlock
29572 29569 29569 0 3 0x4000 ksh
vnlock
29571 29570 29567 92 3 0x4000 ksh
vnlock
29570 29567 29567 92 3 0x4080 ksh
pause
29569 29564 29569 0 3 0x4080 sh
wait
29568 29565 29568 0 3 0x4080 sh
wait
29567 29563 29567 0 3 0x4080 sh
wait
29566 29562 29566 0 3 0x4080 sh
wait
29565 625 625 0 3 0x80 cron
piperd
29564 625 625 0 3 0x80 cron
piperd
29563 625 625 0 3 0x80 cron
piperd
29562 625 625 0 3 0x80 cron
piperd
29554 585 585 32767 3 0x180 httpd
wait
29549 585 585 32767 3 0x180 httpd
wait
29545 585 585 32767 3 0x180 httpd
wait
29543 585 585 32767 3 0x180 httpd
wait
29542 585 585 32767 3 0x100 httpd
vnlock
29513 585 585 32767 3 0x180 httpd
wait
29507 585 585 32767 3 0x180 httpd
wait
29497 585 585 32767 3 0x180 httpd
wait
29496 585 585 32767 3 0x180 httpd
wait
29490 585 585 32767 3 0x100 httpd
vnlock
29483 28669 28661 92 3 0 ksh
vnlock
28669 28662 28661 92 3 0x4080 ksh
pause
28662 28661 28661 92 3 0x4080 ksh
pause
28661 28656 28661 0 3 0x4080 sh
wait
28656 625 625 0 3 0x80 cron
piperd
28546 28540 28538 92 3 0x4000 ksh
vnlock
28540 28538 28538 92 3 0x4080 ksh
pause
28538 28534 28538 0 3 0x4080 sh
wait
28534 625 625 0 3 0x80 cron
piperd
28514 1693 28514 8629 3 0x4082 pine
select
28507 16174 16098 65525 3 0x82 ksh
getblk
28506 27764 19837 62035 3 0x2006 ksh
vnlock
28136 28071 28136 60719 3 0x4082 bash
ttyin
28071 578 28071 0 3 0x180 sshd
select
27764 19839 19837 62035 3 0x82 ksh
pause
26528 13417 13415 19503 3 0x4002 pico
vnlock
25483 1 25481 4588 3 0x4082 ksh
piperd
24651 1 24648 55270 3 0x4082 ksh
piperd
22497 585 585 32767 3 0x180 httpd
wait
22485 585 585 32767 3 0x180 httpd
wait
22484 585 585 32767 3 0x100 httpd
vnlock
22483 585 585 32767 3 0x180 httpd
wait
22482 585 585 32767 3 0x100 httpd
vnlock
22481 585 585 32767 3 0x180 httpd
wait
19839 1 19837 62035 3 0x4082 ksh
pause
18790 580 18790 0 3 0x180 sshd
select
18511 1 18511 53018 3 0x4002 tcsh
ppwait
16679 16639 16679 8537 3 0x4082 ksh
ttyin
16639 620 620 0 3 0x4080 telnetd
select
16184 16182 16098 65525 3 0x4082 dd
ttyin
16182 16181 16098 65525 3 0x4082 ksh
pause
16181 16179 16098 65525 3 0x4082 sh
wait
16179 16100 16098 65525 3 0x4082 getchar
wait
16174 16100 16098 65525 3 0x82 ksh
piperd
16100 16099 16098 65525 3 0x4082 ksh
piperd
16099 16098 16098 65525 3 0x4082 sh
wait
16098 16084 16098 65525 3 0x4182 com
wait
16084 16071 16084 65525 3 0x4082 ksh
pause
16071 578 16071 0 3 0x180 sshd
select
13417 13416 13415 19503 3 0x4082 ksh
pause
13416 13415 13415 19503 3 0x4082 sh
wait
13415 10348 13415 19503 3 0x4182 bboard
wait
13179 12984 13179 51494 3 0x4082 ksh
pause
12984 620 620 0 3 0x4080 telnetd
select
11953 10638 11953 57580 3 0x4082 pine
select
11067 5952 11067 4998 3 0x408f mutt
ttyin
10638 7059 10638 57580 3 0x4082 bash
wait
7059 578 7059 0 3 0x180 sshd
select
6814 6795 6814 8203 3 0x4082 bash
wait
6795 578 6795 0 3 0x180 sshd
select
1693 1682 1693 8629 3 0x4082 bash
wait
1682 620 620 0 3 0x4080 telnetd
select
26858 11278 26858 35058 3 0x408b mutt
wait
15082 3127 15082 52756 3 0x4082 pine
select
19058 19043 19058 40144 3 0x408f mutt
ttyin
19043 19017 19043 40144 3 0x4082 tcsh
pause
19017 578 19017 0 3 0x184 sshd
select
13323 13316 13323 34771 3 0x4084 scp
netio
13316 578 13316 0 3 0x184 sshd
select
10348 10342 10348 19503 3 0x4082 tcsh
pause
10342 578 10342 0 3 0x180 sshd
select
3127 3044 3127 52756 3 0x4082 ksh
pause
3044 578 3044 0 3 0x180 sshd
select
11278 11177 11278 35058 3 0x4082 ksh
pause
11177 578 11177 0 3 0x180 sshd
select
18862 18825 18862 49211 3 0x5002 emacs
vnlock
18854 1 18853 49211 3 0x82 twait
nanosle
18825 18658 18825 49211 3 0x4082 zsh
pause
18658 578 18658 0 3 0x180 sshd
select
13373 13347 13373 37847 3 0x408f mutt
ttyin
13347 13320 13347 37847 3 0x4082 bash
wait
13320 578 13320 0 3 0x184 sshd
select
5952 5925 5952 4998 3 0x4082 bash
wait
5925 578 5925 0 3 0x184 sshd
select
15654 630 15654 100 3 0x2 ksh
vnlock
22312 22303 22312 49614 3 0x4002 bash
vnlock
22303 578 22303 0 3 0x180 sshd
select
630 1 630 100 3 0x4086 ksh
pause
625 1 625 0 3 0x80 cron
nanosle
620 1 620 0 3 0x84 inetd
select
602 1 602 0 3 0x80 timed
select
585 1 585 0 3 0x84 httpd
select
580 1 580 0 3 0x180 sshd
select
578 1 578 0 3 0x180 sshd
select
142 1 142 0 3 0x80 rpc.lockd
select
126 1 126 0 3 0x80 xfs
select
101 1 101 0 3 0x84 ypbind
select
96 1 96 0 3 0x84 rpcbind
select
83 1 83 0 3 0x84 syslogd
select
49 0 0 0 3 0x20284 nfsio
nfsidl
48 0 0 0 3 0x20284 nfsio
nfsidl
47 0 0 0 3 0x20284 nfsio
nfsidl
46 0 0 0 3 0x20284 nfsio
nfsidl
45 0 0 0 3 0x20284 nfsio
nfsidl
44 0 0 0 3 0x20284 nfsio
nfsidl
43 0 0 0 3 0x20284 nfsio
nfsidl
42 0 0 0 3 0x20284 nfsio
nfsidl
41 0 0 0 3 0x20284 nfsio
nfsidl
40 0 0 0 3 0x20284 nfsio
nfsidl
5 0 0 0 3 0x20204 aiodoned
aiodone
4 0 0 0 3 0x20204 ioflush
syncer
3 0 0 0 3 0x20204 reaper
reaper
2 0 0 0 3 0x20204 pagedaemon
pgdaemo
1 0 1 0 3 0x4084 init
wait
0 -1 0 0 3 0x20204 swapper
schedul