NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/53385: vnconfig deadlock on fstchg



>Number:         53385
>Category:       kern
>Synopsis:       vnconfig deadlock on fstchg
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 19 20:55:01 +0000 2018
>Originator:     Manuel Bouyer
>Release:        NetBSD 8.0_RC1
>Organization:
>Environment:
System: NetBSD admin2-dom0.lip6.fr 8.0_RC1 NetBSD 8.0_RC1 (ADMIN_DOM0) #1: Mon Jun 11 11:32:45 MEST 2018 bouyer%armandeche.soc.lip6.fr@localhost:/local/armandeche1/tmp/build/amd64/obj/local/armandeche1/netbsd-8/src/sys/arch/amd64/compile/ADMIN_DOM0 a
md64
Architecture: x86_64
Machine: amd64
>Description:
	This is on a NetBSD/Xen dom0 host. A domU with 2 file-backed disks
	has been destroyed, and I suspect the scripts have called
	the 2 vnconfig -u in parallel. This resulted in I/O stalls with
	most processes waiting on fstchg.
	Here is the ps output from ddb:

PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
8753     1 3   0         0   ffffa00008091660               sshd fstchg
9834     1 3   0         0   ffffa00008091a80               sshd fstchg
9950     1 3   0         0   ffffa00008076640               sshd fstchg
9590     1 3   0   1000000   ffffa00008073200               tcsh fstchg
6430     1 3   0         0   ffffa00008073620               sshd fstchg
8033     1 3   0         0   ffffa00008076a60               sshd fstchg
3594     1 3   0   1000000   ffffa00008076220               tcsh fstchg
5271     1 3   0         0   ffffa00003f21980                 xl fstchg
12415    1 3   0         0   ffffa00003d0d680           vnconfig biowait
11058    1 3   0        80   ffffa00003f1d960                 sh wait
13112    1 3   0        80   ffffa00003d1cac0           vnconfig fstcnt
5137     1 3   0        80   ffffa00003f2d180                 sh wait
9645     2 3   0        80   ffffa00003f0c0c0                 xl netio
9645     1 3   0         0   ffffa00003f0c900                 xl fstchg
13935    1 3   0         0   ffffa00003f671e0               tcsh wait
13693    1 3   0        80   ffffa00008073a40                ksh pause
12178    1 3   0        80   ffffa00003d1c280               tcsh pause
9584     1 3   0        80   ffffa00003dc2ae0               sshd select
10645    1 3   0        80   ffffa00003f67600               sshd select
10534    1 3   0        80   ffffa00003f289a0             pickup kqueue
8128     1 3   0        80   ffffa00003f10500               tcsh ttyraw
9113     1 3   0        80   ffffa00003ee10a0               sshd select
9735     1 3   0        80   ffffa00003e59460               sshd select
11149    1 3   0        80   ffffa00003f2d9c0               tcsh ttyraw
8069     1 3   0        80   ffffa00002aef6a0                ksh pause
6550     1 3   0        80   ffffa00003cef660               sshd select
12226    3 3   0   1000080   ffffa00003f1d540            qemu-dm netio
12226    2 3   0   1000080   ffffa00003e53020            qemu-dm netio
12226    1 3   0   1000000   ffffa00002f8b140            qemu-dm fstchg
3406     1 3   0        80   ffffa00002adf260               tcsh ttyraw
1451     1 3   0        80   ffffa00003e4c000                ksh pause
7092     1 3   0        80   ffffa000033955a0               tcsh pause
4314     1 3   0        80   ffffa00002aefac0       screen-4.6.2 select
4212     1 3   0        80   ffffa00002adfaa0              getty ttyraw
3519     1 3   0        80   ffffa00002adf680              getty ttyraw
6078     1 3   0        80   ffffa00002819240              getty ttyraw
7444     1 3   0         0   ffffa00002817a60              getty fstchg
6455     1 3   0         0   ffffa00003dcdb00               cron fstchg
3915     1 3   0        80   ffffa00003f21140              inetd kqueue
5033     1 3   0        80   ffffa00003e4c420               qmgr kqueue
4296     1 3   0        80   ffffa00003e5f8a0             master kqueue
2276     1 3   0        80   ffffa00003f21560             smartd nanoslp
1641     1 3   0        80   ffffa00003f335c0             upsmon nanoslp
2908     1 3   0        80   ffffa00003f17520             upsmon pipe_rd
5241     2 3   0        80   ffffa00003395180                 xl netio
5241     1 3   0        80   ffffa00003dfcb40                 xl select
6252     1 3   0         0   ffffa00003e17740               tcsh wait
5508     2 3   0        80   ffffa00003f1d120                 xl netio
5508     1 3   0        80   ffffa00003e59880                 xl select
5875     3 3   0   1000080   ffffa00003e47ba0            qemu-dm netio
5875     2 3   0   1000080   ffffa00003f10920            qemu-dm netio
5875     1 3   0   1000000   ffffa00003dcd2c0            qemu-dm fstchg
3821     1 3   0        80   ffffa00003dcd6e0                ksh pause
3651     2 3   0        80   ffffa00003f3aa00                 xl netio
3651     1 3   0        80   ffffa00003dc22a0                 xl select
3736     3 3   0   1000080   ffffa00003e30340            qemu-dm netio
3736     2 3   0   1000080   ffffa00003de12e0            qemu-dm netio
3736     1 3   0        80   ffffa00003de1700            qemu-dm select
6648     2 3   0        80   ffffa00003f339e0                 xl netio
6648     1 3   0        80   ffffa00003dfc300                 xl select
5294     3 3   0   1000080   ffffa00003e59040            qemu-dm netio
5294     2 3   0   1000080   ffffa00003f100e0            qemu-dm netio
5294     1 3   0        80   ffffa00003f331a0            qemu-dm select
3797     2 3   0        80   ffffa00003e47360                 xl netio
3797     1 3   0        80   ffffa00003e5f480                 xl select
6759     1 3   0        80   ffffa00003f3a5e0               tcsh pause
2896     1 3   0        80   ffffa00003f0c4e0               sshd select
5811     2 3   0        80   ffffa00003dc26c0                 xl netio
5811     1 3   0        80   ffffa00003de1b20                 xl select
1745     1 3   0        80   ffffa00003e17320               sshd select
4614     2 3   0        80   ffffa00003ee14c0                 xl netio
4614     1 3   0        80   ffffa00003e30b80                 xl select
6813     2 3   0        80   ffffa00003e47780                 xl netio
6813     1 3   0        80   ffffa00003ee18e0                 xl select
161      2 3   0        80   ffffa00003cef240                 xl netio
161      1 3   0        80   ffffa00003d0d260                 xl select
2211     2 3   0        80   ffffa00003cdd640                 xl netio
2211     1 3   0        80   ffffa00003cbe200                 xl select
1717     2 3   0        80   ffffa00003cbea40                 xl netio
1717     1 3   0        80   ffffa00003b2b1e0                 xl select
1790     2 3   0        80   ffffa00003b175e0                 xl netio
1790     1 3   0        80   ffffa00003b171c0                 xl select
1646     2 3   0        80   ffffa00002f93580        xenconsoled netio
1646     1 3   0        80   ffffa00002aef280        xenconsoled select
1613     1 3   0        80   ffffa000033e11a0          xenstored select
1627     1 3   0        80   ffffa000033e19e0               sshd select
1622     1 3   0        80   ffffa00002f93160             powerd kqueue
1605     1 3   0        80   ffffa000033959c0               ntpd pause
1564     1 3   0         0   ffffa00002f8b980              ipmon fstchg
1476     1 2   0         0   ffffa00002f108a0            syslogd
1        1 3   0        80   ffffa000026651e0               init wait
0      183 5   0       200   ffffa00003f28580           (zombie)
0      182 3   0       200   ffffa00003f17940              vnd13 fstchg
0      181 3   0       200   ffffa00003e5f060              vnd12 vndbp
0      160 3   0       200   ffffa00003e944a0       bridge_rtage bridge_rtage
0      159 3   0       200   ffffa00003f2d5a0       xbdb13i51712 xbdb13i51712
0      158 3   0       200   ffffa00003e94080               vnd8 fstchg
0      144 3   0       200   ffffa00003e948c0         xbdb11i768 xbdb11i768
0      143 3   0       200   ffffa00003f67a20       xbdb10i51712 xbdb10i51712
0      142 3   0       200   ffffa00002d74b00           xbdb12i1 xbdb12i1
0      140 3   0       200   ffffa00003f3a1c0              vnd11 fstchg
0      139 3   0       200   ffffa00003e53440              vnd10 fstchg
0      138 3   0       200   ffffa00003f28160               vnd9 vndbp
0      136 3   0       200   ffffa00003e4c840            xbdb8i1 xbdb8i1
0      135 3   0       200   ffffa00003e30760            xbdb7i1 xbdb7i1
0      134 3   0       200   ffffa00003e53860               vnd7 fstchg
0      133 3   0       200   ffffa00003f17100               vnd6 fstchg
0      132 3   0       200   ffffa00003b17a00            xbdb6i1 xbdb6i1
0      131 3   0       200   ffffa00003e17b60            xbdb5i1 xbdb5i1
0      130 3   0       200   ffffa00003dfc720               vnd5 fstchg
0      129 3   0       200   ffffa00003b2b600            xbdb4i1 xbdb4i1
0      128 3   0       200   ffffa00003d1c6a0               vnd4 fstchg
0      127 3   0       200   ffffa00003cefa80            xbdb3i1 xbdb3i1
0      126 3   0       200   ffffa00003d0daa0               vnd3 fstchg
0      125 3   0       200   ffffa00003cdda60            xbdb2i1 xbdb2i1
0      124 3   0       200   ffffa00003cdd220               vnd2 fstchg
0      123 3   0       200   ffffa00003b2ba20            xbdb1i1 xbdb1i1
0      122 3   0       200   ffffa00003cbe620               vnd1 fstchg
0      121 3   0       200   ffffa000022a78e0               vnd0 fstchg
0      120 3   0       200   ffffa000033e15c0        xen_balloon xen_balloon
0      119 3   0       200   ffffa00002f8b560              ipmi0 ipmi0
0      118 3   0       200   ffffa00002f939a0       bridge_rtage bridge_rtage
0      117 3   0       200   ffffa00002f84960       bridge_rtage bridge_rtage
0      116 3   0       200   ffffa00002f84540       bridge_rtage bridge_rtage
0      115 3   0       200   ffffa00002f84120       bridge_rtage bridge_rtage
0      114 3   0       200   ffffa00002f7b100       bridge_rtage bridge_rtage
0      113 3   0       200   ffffa00002f7b520       bridge_rtage bridge_rtage
0      112 3   0       200   ffffa00002f7b940       bridge_rtage bridge_rtage
0      111 3   0       200   ffffa00002f720e0       bridge_rtage bridge_rtage
0      110 3   0       200   ffffa00002f72500       bridge_rtage bridge_rtage
0      109 3   0       200   ffffa00002f72920       bridge_rtage bridge_rtage
0      108 3   0       200   ffffa00002f690c0       bridge_rtage bridge_rtage
0      107 3   0       200   ffffa00002f694e0       bridge_rtage bridge_rtage
0      106 3   0       200   ffffa00002f69900       bridge_rtage bridge_rtage
0      105 3   0       200   ffffa00002f610a0       bridge_rtage bridge_rtage
0      104 3   0       200   ffffa00002f614c0       bridge_rtage bridge_rtage
0      103 3   0       200   ffffa00002f618e0       bridge_rtage bridge_rtage
0      102 3   0       200   ffffa00002f18080       bridge_rtage bridge_rtage
0      101 3   0       200   ffffa00002f184a0       bridge_rtage bridge_rtage
0      100 3   0       200   ffffa00002f188c0       bridge_rtage bridge_rtage
0       99 3   0       200   ffffa00002f10060       bridge_rtage bridge_rtage
0       98 3   0       200   ffffa00002f10480       bridge_rtage bridge_rtage
0       97 3   0       200   ffffa00002ec92e0       bridge_rtage bridge_rtage
0       96 3   0       200   ffffa00002f07040       bridge_rtage bridge_rtage
0       95 3   0       200   ffffa00002f07460       bridge_rtage bridge_rtage
0       94 3   0       200   ffffa00002f07880       bridge_rtage bridge_rtage
0       93 3   0       200   ffffa00002efe020       bridge_rtage bridge_rtage
0       92 3   0       200   ffffa00002efe440       bridge_rtage bridge_rtage
0       91 3   0       200   ffffa00002efe860       bridge_rtage bridge_rtage
0       90 3   0       200   ffffa00002ef5000       bridge_rtage bridge_rtage
0       89 3   0       200   ffffa00002ef5420       bridge_rtage bridge_rtage
0       88 3   0       200   ffffa00002ef5840       bridge_rtage bridge_rtage
0       87 3   0       200   ffffa00002eec360       bridge_rtage bridge_rtage
0       86 3   0       200   ffffa00002ee4340       bridge_rtage bridge_rtage
0       85 3   0       200   ffffa00002eecba0       bridge_rtage bridge_rtage
0       84 3   0       200   ffffa00002eec780       bridge_rtage bridge_rtage
0       83 3   0       200   ffffa00002ee4760       bridge_rtage bridge_rtage
0       82 3   0       200   ffffa00002ee4b80       bridge_rtage bridge_rtage
0       81 3   0       200   ffffa00002edb320       bridge_rtage bridge_rtage
0       80 3   0       200   ffffa00002ed1720       bridge_rtage bridge_rtage
0       79 3   0       200   ffffa00002edbb60       bridge_rtage bridge_rtage
0       78 3   0       200   ffffa00002edb740       bridge_rtage bridge_rtage
0       77 3   0       200   ffffa00002d746e0       bridge_rtage bridge_rtage
0       76 3   0       200   ffffa00002ed1b40       bridge_rtage bridge_rtage
0       75 3   0       200   ffffa00002ed1300       bridge_rtage bridge_rtage
0       74 3   0       200   ffffa00002ec9700       bridge_rtage bridge_rtage
0       73 3   0       200   ffffa00002b0c2a0       bridge_rtage bridge_rtage
0       72 3   0       200   ffffa00002d742c0       bridge_rtage bridge_rtage
0       71 3   0       200   ffffa00002ec9b20       bridge_rtage bridge_rtage
0       70 3   0       200   ffffa00002b0c6c0       bridge_rtage bridge_rtage
0       69 3   0       200   ffffa00002b0cae0       bridge_rtage bridge_rtage
0       68 3   0       200   ffffa00002817640            physiod physiod
0       67 3   0       200   ffffa00002819660           aiodoned aiodoned
0       66 3   0       200   ffffa00002819a80            ioflush fstchg
0       65 3   0       200   ffffa00002817220           pgdaemon pgdaemon
0       62 3   0       200   ffffa00002759200            raidio0 raidiow
0       61 3   0       200   ffffa00002637140              raid0 rfnodeq
0       60 3   0       200   ffffa00002759620          atapibus0 sccomp
0       56 3   0       200   ffffa00002637560               usb7 usbevt
0       55 3   0       200   ffffa00002637980               usb6 usbevt
0       54 3   0       200   ffffa00002636120               usb5 usbevt
0       53 3   0       200   ffffa00002636540               usb4 usbevt
0       52 3   0       200   ffffa00002636960               usb3 usbevt
0       51 3   0       200   ffffa0000262e100               usb2 usbevt
0       50 3   0       200   ffffa0000262e520               usb1 usbevt
0       49 3   0       200   ffffa00002759a40               usb0 usbevt
0       48 3   0       200   ffffa00002665600            rt_free rt_free
0       47 3   0       200   ffffa00002665a20              unpgc unpgc
0       46 3   0       200   ffffa0000265f1c0    key_timehandler key_timehandler

0       45 3   0       200   ffffa0000265f5e0    icmp6_wqinput/0 icmp6_wqinput
0       44 3   0       200   ffffa0000265fa00    ip6flow_slowtim ip6flow_slowtim
o$uwk2
0       43 3   0       200   ffffa0000265c1a0          nd6_timer nd6_timer
0       42 3   0       200   ffffa0000265c5c0    carp6_wqinput/0 carp6_wqinput
0       41 3   0       200   ffffa0000265c9e0     carp_wqinput/0 carp_wqinput
0       40 3   0       200   ffffa00002651180     icmp_wqinput/0 icmp_wqinput
0       39 3   0       200   ffffa000026515a0           rt_timer rt_timer
0       38 3   0       200   ffffa000026519c0    ipflow_slowtimo ipflow_slowtimo

0       37 3   0       200   ffffa00002639160        vmem_rehash vmem_rehash
0       36 3   0       200   ffffa00002639580             xenbus rdst
0       35 3   0       200   ffffa000026399a0           xenwatch evtsq
0       26 3   0       200   ffffa0000262e940               iic0 iicintr
0       25 3   0       200   ffffa000024c80e0            atabus5 atath
0       24 3   0       200   ffffa000024c8500            atabus4 atath
0       23 3   0       200   ffffa000024c8920            atabus3 atath
0       22 3   0       200   ffffa000024bd0c0            atabus2 atath
0       21 3   0       200   ffffa000024bd4e0            atabus1 atath
0       20 3   0       200   ffffa000024bd900            atabus0 atath
0       19 3   0       200   ffffa000022a70a0         usbtask-dr usbtsk
0       18 3   0       200   ffffa000022a74c0         usbtask-hc usbtsk
0       16 3   0       200   ffffa0000208c080               ipmi ipmipoll
0       15 3   0       200   ffffa0000208c4a0             sysmon smtaskq
0       14 3   0       200   ffffa0000208c8c0         pmfsuspend pmfsuspend
0       13 3   0       200   ffffa00002086060           pmfevent pmfevent
0       12 3   0       200   ffffa00002086480         sopendfree sopendfr
0       11 3   0       200   ffffa000020868a0           nfssilly nfssilly
0       10 3   0       200   ffffa00001cf0040            cachegc cachegc    tr
(unfortunably the Xen console buffer isn't large enough to have the complete
 ps output).

Note that one vnconfig is on biowait, the second one on fstcnt. Other
processes are on fstchg, I guess because of one of the vnconfig.

traces of the 2 vnconfigs:

db> tr/a ffffa00003d0d680
trace: pid 12415 lid 1 at 0xffffa0004e709940
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf0
biowait() at netbsd:biowait+0x4f
validate_label() at netbsd:validate_label+0x2a
readdisklabel() at netbsd:readdisklabel+0x1bc
vndopen() at netbsd:vndopen+0x2db
spec_open() at netbsd:spec_open+0x385
VOP_OPEN() at netbsd:VOP_OPEN+0x2f
vn_open() at netbsd:vn_open+0x1e9
do_open() at netbsd:do_open+0x112
do_sys_openat() at netbsd:do_sys_openat+0x68
sys_open() at netbsd:sys_open+0x24
syscall() at netbsd:syscall+0x9c
--- syscall (number 5) ---
758b2943e2ca:
db> tr/a ffffa00003d1cac0
trace: pid 13112 lid 1 at 0xffffa0004e744860
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait_sig() at netbsd:cv_wait_sig+0xf4
fstrans_setstate() at netbsd:fstrans_setstate+0x9f
genfs_suspendctl() at netbsd:genfs_suspendctl+0x57
vfs_suspend() at netbsd:vfs_suspend+0x5b
vrevoke_suspend_next() at netbsd:vrevoke_suspend_next+0x2a
vrevoke() at netbsd:vrevoke+0x2b
genfs_revoke() at netbsd:genfs_revoke+0x13
VOP_REVOKE() at netbsd:VOP_REVOKE+0x2e
vdevgone() at netbsd:vdevgone+0x5a
vnddoclear() at netbsd:vnddoclear+0xb9
vndioctl() at netbsd:vndioctl+0x361
VOP_IOCTL() at netbsd:VOP_IOCTL+0x37
vn_ioctl() at netbsd:vn_ioctl+0xa6
sys_ioctl() at netbsd:sys_ioctl+0x101
syscall() at netbsd:syscall+0x9c
--- syscall (number 54) ---
7d6abdcfedda:

I suspect the first vnconfig is stuck on biowait because the underlying
filesystem is suspended. There's lots of vnd threads stuck on fstchg:
db> tr/a ffffa00003f17940
trace: pid 0 lid 182 at 0xffffa0004c8fd4f0
sleepq_block() at netbsd:sleepq_block+0x99
cv_wait() at netbsd:cv_wait+0xf0
fstrans_start() at netbsd:fstrans_start+0x78e
VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x42
genfs_getpages() at netbsd:genfs_getpages+0x1344
VOP_GETPAGES() at netbsd:VOP_GETPAGES+0x4b
ubc_fault() at netbsd:ubc_fault+0x188
uvm_fault_internal() at netbsd:uvm_fault_internal+0x6d4
trap() at netbsd:trap+0x3c1
--- trap (number 6) ---
kcopy() at netbsd:kcopy+0x15
uiomove() at netbsd:uiomove+0xb9
ubc_uiomove() at netbsd:ubc_uiomove+0xf7
ffs_read() at netbsd:ffs_read+0xf7
VOP_READ() at netbsd:VOP_READ+0x33
vn_rdwr() at netbsd:vn_rdwr+0x10c
vndthread() at netbsd:vndthread+0x4a7


>How-To-Repeat:
	destroy a domU with 2 file-backed disks ? Or run multiple vnconfig -u
	concurently ?
>Fix:
	workaround: make sure Xen won't call more than one vnconfig -u at
	once. But we need a fix for this.



Home | Main Index | Thread Index | Old Index