So I've been meaning to test crash(8) for some time now, but had never found the occasion. Well today I found a good reason. I've been re-arranging some servers and ended up leaving some NFS mounts in place when I should not have. This of course causes no end of problems if the nightly cron jobs try to run as various processes can get stuck, e.g. the "find"s that run. The good news is that it is now possible to kill these processes (whereas in older releases they were forever unkillable despite using interruptable mounts). However "umount -f /nfs/mount" is still not working. It appears, from "ps", that it gets stuck in a socket call. # umount -f /future/build [ 2436610.5372572] load: 0.88 cmd: umount 2940 [tstile] 0.00u 0.00s 0% 236k [ 2436611.6273278] load: 0.88 cmd: umount 2940 [tstile] 0.00u 0.00s 0% 236k .... here I killed the stuck find processes from another window .... [ 2436643.7493935] load: 1.03 cmd: umount 2940 [nfscn2] 0.00u 0.00s 0% 236k [ 2436645.2894833] load: 1.03 cmd: umount 2940 [nfscn2] 0.00u 0.00s 0% 236k [ 2436788.9486762] load: 1.00 cmd: umount 2940 [nfscn2] 0.00u 0.01s 0% 236k [ 2436794.2290140] load: 1.00 cmd: umount 2940 [nfscn2] 0.00u 0.01s 0% 236k [ 2437106.0289114] load: 1.04 cmd: umount 2940 [nfscn2] 0.00u 0.02s 0% 236k [ 2441119.0864077] load: 1.00 cmd: umount 2940 [nfscn2] 0.00u 0.14s 0% 236k [ 2441394.6442117] load: 1.08 cmd: umount 2940 [nfscn2] 0.00u 0.16s 0% 236k Note also that fstat(1) is entirely useless at finding those stuck processes as it just gets stuck itself: # fstat /future/build [ 2436819.2006111] load: 1.22 cmd: fstat 6722 [tstile] 0.00u 0.00s 0% 440k [ 2436820.8607171] load: 1.22 cmd: fstat 6722 [tstile] 0.00u 0.00s 0% 440k Note as you'll see below this particular example is from a not-very-current current system, but I see the exact same behaviour on a stock 9.0 and on a much more current 9.99.64. So, let's see if we can figure out why "umount -f" isn't forcing things: 12:00 [1.60] # ps alx UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 0 0 0 0 125 0 0 159628 uvm DKl ? 1013:51.76 [system] 0 1 0 671 85 0 3968 472 wait Is ? 0:00.01 init 32767 187 1 0 85 0 3060 1272 select Ss ? 1:05.62 /usr/sbin/mdnsd 0 249 1 0 85 0 8732 1216 kqueue Ss ? 0:06.69 /usr/sbin/syslogd -s 0 283 1 0 85 0 26552 1200 nanoslp Ss ? 0:08.00 /usr/pkg/sbin/xenwatchdogd 30 15 19 528 1 0 85 0 5256 480 select Ss ? 1:38.50 /usr/sbin/rwhod -u _rwhod 0 563 1 0 85 0 46832 1944 select Isl ? 0:00.00 /usr/pkg/sbin/xl create /etc/xen/more.conf 0 574 1 0 85 0 26552 984 nanoslp Ss ? 0:07.99 /usr/pkg/sbin/xenwatchdogd 30 15 0 591 1 0 85 0 33000 1864 select Sl ? 63:21.44 /usr/pkg/sbin/xenconsoled --log=guest 12 595 913 0 85 0 6216 1228 kqueue I ? 0:02.83 qmgr -l -t unix -u 0 599 1 0 85 0 9504 8712 pause Ss ? 2:51.72 /usr/sbin/ntpd -p /var/run/ntpd.pid -g 1000 609 1906 0 85 0 13660 7076 select Is ? 0:00.81 xterm -class UXTerm 0 611 1 20 85 0 2344 316 kqueue Is ? 0:00.00 /usr/sbin/powerd 0 622 1 4 85 0 6756 1076 select Is ? 0:00.00 /usr/sbin/sshd 0 650 1 0 85 0 31720 1884 select I ? 0:08.23 /usr/pkg/sbin/xenstored --pid-file /var/run/xenstored.pi 0 658 1 0 85 0 26512 3044 select Is ? 0:00.10 conserver: master: port 782, 1 local, 0 remote 0 913 1 0 85 0 2788 864 kqueue Is ? 0:11.68 /usr/libexec/postfix/master -w 0 997 1 0 85 0 2544 620 nanoslp Is ? 0:05.55 /usr/sbin/cron 1000 1002 13272 174 85 0 21204 4344 select Ss ? 0:00.15 xterm -class UXTerm 0 1004 1 0 85 0 2796 864 kqueue Ss ? 0:00.02 /usr/sbin/inetd -l 0 1906 1004 0 85 0 2624 684 select I ? 0:00.01 rshd -L 0 3812 1004 0 85 0 2624 684 select I ? 0:00.01 rshd -L 1000 4007 3812 0 85 0 26472 20676 select Is ? 0:14.22 xterm -class UXTerm 0 5999 1004 0 85 0 2620 648 select I ? 0:00.01 rshd -L 0 8861 658 0 85 0 26512 3124 select S ? 0:04.76 conserver: group 1: port 65506, 1 console 1000 9003 16131 0 85 0 13648 1680 select Ss ? 0:02.53 xload 0 13272 1004 699 85 0 6124 652 select S ? 0:00.01 rshd -L 1000 14064 5999 0 85 0 21428 7896 select Ss ? 0:00.38 xterm -class UXTerm 0 15270 1004 0 85 0 6028 652 select I ? 0:00.01 rshd -L 12 16008 913 0 85 0 7860 1140 kqueue I ? 0:00.01 pickup -l -t unix -u 0 16131 1004 0 85 0 2620 644 select I ? 0:00.01 rshd -L 1000 20090 15270 0 85 0 25056 7028 select Is ? 0:00.24 xterm -class UXTerm 1000 1768 609 0 85 0 11708 1056 ttyraw Is+ pts/1 0:00.09 -ksh 0 2940 7584 0 117 0 17264 236 nfscn2 D+ pts/2 0:00.14 umount -f /future/build 1000 4103 4007 0 85 0 4120 1064 pause Is pts/2 0:00.09 -ksh 0 7584 4103 0 85 0 7656 1088 pause I pts/2 0:00.35 ksh 0 6722 27639 0 127 0 16768 440 tstile D+ pts/3 0:00.00 fstat /future/build 1000 21172 14064 489 85 0 3728 1060 pause Is pts/3 0:00.09 -ksh 0 27639 21172 0 85 0 9648 1048 pause I pts/3 0:00.08 ksh 1000 13000 20090 722 85 0 3600 1056 ttyraw Is+ pts/4 0:00.11 -ksh 0 3707 19523 1057 85 0 11736 1044 pause S pts/5 0:00.08 ksh 0 4176 3707 1057 43 0 12900 624 - O+ pts/5 0:00.00 ps -alx 1000 19523 1002 3550 85 0 3188 1056 pause Ss pts/5 0:00.09 -ksh 0 1013 1 527 85 0 2660 652 ttyraw Is+ ttyE0 0:00.08 -ksh 0 822 1 126 85 0 2524 412 ttyraw Is+ ttyE1 0:00.00 /usr/libexec/getty Ws ttyE1 0 828 1 126 85 0 2524 416 ttyraw Is+ ttyE2 0:00.00 /usr/libexec/getty Ws ttyE2 0 957 1 126 85 0 2528 412 ttyraw Is+ ttyE3 0:00.00 /usr/libexec/getty Ws ttyE3 0 862 1 126 85 0 4188 408 ttyraw Is+ ttyE4 0:00.00 /usr/libexec/getty Ws ttyE4 0 1023 1 126 85 0 2524 424 ttyraw Is+ ttyE5 0:00.00 /usr/libexec/getty Ws ttyE5 0 1050 1 126 85 0 2524 428 ttyraw Is+ ttyE6 0:00.00 /usr/libexec/getty Ws ttyE6 0 668 1 0 85 0 2528 416 ttyraw Is+ xencons 0:00.01 /usr/libexec/getty console constty 12:00 [1.61] # crash Crash version 8.99.32, image version 8.99.32. Output from a running system is unreliable. crash> bt /t 0t2940 trace: pid 2940 lid 1 at 0xffffaf808a4748f0 sleepq_block() at sleepq_block+0xfd kpause() at kpause+0xdf nfs_reconnect() at nfs_reconnect+0x8b nfs_request() at nfs_request+0xf3a nfs_getattr() at nfs_getattr+0x175 VOP_GETATTR() at VOP_GETATTR+0x49 vn_stat() at vn_stat+0x3d do_sys_statat() at do_sys_statat+0x97 sys___lstat50() at sys___lstat50+0x25 syscall() at syscall+0x9c --- syscall (number 441) --- 43292a: crash> bt /t 0t6722 trace: pid 6722 lid 1 at 0xffffaf808a488920 sleepq_block() at sleepq_block+0x99 turnstile_block() at turnstile_block+0x337 rw_vector_enter() at rw_vector_enter+0x169 genfs_lock() at genfs_lock+0x3c VOP_LOCK() at VOP_LOCK+0x71 vn_lock() at vn_lock+0x90 nfs_root() at nfs_root+0x2b lookup_once() at lookup_once+0x38e namei_tryemulroot() at namei_tryemulroot+0x453 namei() at namei+0x29 fd_nameiat.isra.2() at fd_nameiat.isra.2+0x54 do_sys_statat() at do_sys_statat+0x87 sys___stat50() at sys___stat50+0x28 syscall() at syscall+0x9c --- syscall (number 439) --- 43c94a: crash> So it would seem that even though umount is trying to force an unmount of an NFS mount, the kernel is first trying to reconnect to the server! BTW, I have another system running a quite recent i386 build where crash(8) is unable to do a backtrace: # ktrace crash Crash version 9.99.64, image version 9.99.64. Kernel compiled without options LOCKDEBUG. Output from a running system is unreliable. crash> trace /t 0t4003 crash: kvm_read(0x4, 4): kvm_read: Bad address trace: pid 4003 lid 4003 crash> -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpNhjdtD4u4f.pgp
Description: PGP signature