NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/50375: layerfs (nullfs) locking problem leading to livelock
>Number: 50375
>Category: kern
>Synopsis: layerfs (nullfs) locking problem leading to livelock
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Oct 28 15:45:00 +0000 2015
>Originator: Jeff Rizzo
>Release: 7.99.21/evbarm
>Organization:
>Environment:
NetBSD jetson1.lan 7.99.21 NetBSD 7.99.21 (JETSONTK1) #9: Thu Oct 15 14:36:15 PDT 2015 riz%cassava.tastylime.net@localhost:/scratch/evbarm7/obj/sys/arch/evbarm/compile/JETSONTK1 evbarm
>Description:
Doing pbulk builds with nullfs mounts in chroots on a 4-core ARM (tegra tk1) system, I very frequently see a problem where it stops making progress, and a bunch of processes get stuck in 'tstile'. This time I happened to notice that process 28575 was the first to enter tstile. (see below)
When it does this, I can use crash and ddb to get info, and gdb against /dev/mem seems to work somewhat (not "info threads", though), but I have not been able to get a crash dump.
My interpretation of the debugging I got below is that the "culprit" process is PID 14346, which was:
polkit 14346 0.0 0.1 4936 2552 ? D 7:15AM 0:00.50 /usr/bi 1001 14346 26273 34233 125 0 4936 2552 vnode D ? 0:00.50 /usr/bin/make _MAKE OPSYS OS_VERSION LOWER_OPSYS _PKGSRCDIR PKGTOOLS_VERSION _CC _PATH_ORIG _PKGSRC_BARRIER ALLOW_VULNERABLE_PACKAGES all
My understanding is that the next step would be to look at the individual frames of the backtrace of that process to figure out what vp is - I would appreciate suggestions for how to do this with the system live, using either ddb or gdb against /dev/mem. (Assume I don't know what I'm doing, and give me very specific instructions :)
crash> ps/l |grep tstile
29934 1 3 3 0 96f83460 sh tstile
23822 1 3 1 0 9357ce20 sh tstile
28524 1 3 0 0 93dea080 sh tstile
21780 1 3 3 0 93983120 sh tstile
28575 1 3 3 0 96f831a0 python3.4 tstile
2319 1 3 0 0 92fec960 gvfsd-trash tstile
0 67 3 2 200 91c733e0 ioflush tstile
0 9 3 0 200 91596840 vdrain tstile
crash> bt/a 96f831a0
trace: pid 28575 lid 1 at 0xa1f57aa4
0xa1f57aa4: mi_switch+0x10
0xa1f57ad4: sleepq_block+0xb4
0xa1f57b14: turnstile_block+0x318
0xa1f57b8c: rw_vector_enter+0x3c0
0xa1f57bbc: genfs_lock+0x68
0xa1f57be4: VOP_LOCK+0x40
0xa1f57c0c: layer_lock+0x44
0xa1f57c34: VOP_LOCK+0x40
0xa1f57c5c: vn_lock+0x88
0xa1f57cac: lookup_once+0x224
0xa1f57d7c: namei_tryemulroot+0x528
0xa1f57db4: namei+0ameiat.isra.0+0x64
0xa1f57e4c: do_sys_statat+0x84
0xa1f57f04: sys___stat50+0x2c
0xa1f57f7c: syscall+0xb8
0xa1f57fac: swi_handler+0xa0
crash> ps/w |grep tstile
29934 1 sh netbsd 27 tstile 922b78e4
23822 1 sh netbsd 27 tstile 922b78e4
28524 1 sh netbsd 27 tstile 935fb98c
21780 1 sh netbsd 27 tstile 951f781c
28575 1 python3.4 netbsd 27 tstile 92b1d834
2319 1 gvfsd-trash netbsd 43 tstile 922b78e4
0 67 system netbsd 124 tstile 951f781c
0 9 system netbsd 125 tstile 951f781c
db{3}> show lock 92b1d834
lock address : 0x0000000092b1d834 type : sleep/adaptive
initialized : 0x000000008136442c
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 1
current cpu : 3 last held: 2
current lwp : 0x00000000915c10c0 last held: 0x0000000093450300
last locked* : 0x00000000813795f8 unlocked : 0x0000000081379714
owner/count : 0x0000000093450300 flags : 0x0000000000000007
Turnstile chain at 0x81609eb0.
=> Turnstile at 0x9706bd90 (wrq=0x9706bda0, rdq=0x9706bda8).
=> 0 waiting readers:
=> 1 waiting writers: 0x96f831a0
db{3}> bt/a 0x0000000093450300
trace: pid 14346 lid 1 at 0x9d6218c4
0x9d6218c4: netbsd:mi_switch+0x10
0x9d6218f4: netbsd:sleepq_block+0xb4
0x9d62192c: netbsd:cv_wait+0x130
0x9d621954: netbsd:vwait+0x50
0x9d62197c: netbsd:vget+0xd4
0x9d6219e4: netbsd:vcache_get+0x158
0x9d621a14: netbsd:layer_node_create+0x2c
0x9d621a44: netbsd:layer_lookup+0xfc
0x9d621a7c: netbsd:VOP_LOOKUP+0x48
0x9d621bdc: netbsd:getcwd_common+0x258
0x9d621bfc: netbsd:vn_isunder+0x2c
0x9d621c4c: netbsd:lookup_once+0xfc
0x9d621d1c: netbsd:namei_tryemulroot+0x528
0x9d621d54: netbsd:namei+0x34
0x9d621e2c: netbsd:vn_open+0x94
0x9d621eac: netbsd:do_open+0xb0
0x9d621edc: netbsd:do_sys_openat+0x7c
0x9d621f04: netbsd:sys_open+0x38
0x9d621f7c: netbsd:syscall+0xb8
0x9d621fac: netbsd:swi_handler+0xa0
db{3}>
db{3}> show lock 922b78e4
lock address : 0x00000000922b78e4 type : sleep/adaptive
initialized : 0x000000008136442c
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 3
current cpu : 3 last held: 0
current lwp : 0x00000000915c10c0 last held: 0x0000000093dea080
last locked* : 0x00000000813795f8 unlocked : 0x0000000081379714
owner/count : 0x0000000093dea080 flags : 0x0000000000000007
Turnstile chain at 0x81609f60.
=> Turnstile at 0x9706b6c8 (wrq=0x9706b6d8, rdq=0x9706b6e0).
=> 0 waiting readers:
=> 3 waiting writers: 0x92fec960 0x9357ce20 0x96f83460
db{3}> show lock 935fb98c
lock address : 0x00000000935fb98c type : sleep/adaptive
initialized : 0x000000008136442c
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 1
current cpu : 3 last held: 3
current lwc10c0 last held: 0x0000000093983120
last locked* : 0x00000000813795f8 unlocked : 0x0000000081379714
owner/count : 0x0000000093983120 flags : 0x0000000000000007
Turnstile chain at 0x8160a008.
=> Turnstile at 0x9706afc8 (wrq=0x9706afd8, rdq=0x9706afe0).
=> 0 waiting readers:
=> 1 waiting writers: 0x93dea080
db{3}> show lock 951f781c
lock address : 0x00000000951f781c type : sleep/adaptive
initialized : 0x000000008136442c
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 3
current cpu : 3 last held: 3
current lwp : 0x00000000915c10c0 last held: 0x0000000096f831a0
last locked* : 0x00000000813795f8 unlocked : 0x0000000081379714
owner/count : 0x0000000096f831a0 flags : 0x0000000000000007
Turnstile chain at 0x81609e98.
=> Turnstile at 0x9706af90 (wrq=0x9706afa0, rdq=0x9706afa8).
=> 0 waiting readers:
=> 3 waiting writers: 0x91596840 0x91c733e0 0x93983120
db{3}>
db{3}> bt/a 0x0000000093dea080
trace: pid 28524 lid 1 at 0xa4aa7aa4
0xa4aa7aa4: netbsd:mi_switch+0x10
0xa4aa7ad4: netbsd:sleepq_block+0xb4
0xa4aa7b14: netbsd:turnstile_block+0x318
0xa4aa7b8c: netbsd:rw_enter+0x3c0
0xa4aa7bbc: netbsd:genfs_lock+0x68
0xa4aa7be4: netbsd:VOP_LOCK+0x40
0xa4aa7c0c: netbsd:layer_lock+0x44
0xa4aa7c34: netbsd:VOP_LOCK+0x40
0xa4aa7c5c: netbsd:vn_lock+0x88
0xa4aa7cac: netbsd:lookup_once+0x224
0xa4aa7d7c: netbsd:namei_tryemulroot+0x528
0xa4aa7db4: netbsd:namei+0x34
0xa4aa7ddc: netbsd:fd_nameiat.isra.0+0x64
0xa4aa7e4c: netbsd:do_sys_statat+0x84
0xa4aa7f04: netbsd:sys___stat50+0x2c
0xa4aa7f7c: netbsd:syscall+0xb8
0xa4aa7fac: netbsd:swi_handler+0xa0
db{3}> bt/a 0x0000000093983120
trace: pid 21780 lid 1 at 0x9ec71aa4
0x9ec71aa4: netbsd:mi_switch+0x10
0x9ec71ad4: netbsd:sleepq_block+0xb4
0x9ec71b14: netbsd:turnstile_block+0x318
0x9ec71b8c: netbsd:rw_enter+0x3c0
0x9ec71bbc: netbsd:genfs_lock+0x68
0x9ec71be4: netbsd:VOP_LOCK+0x40
0x9ec71c0c: netbsd:layer_lock+0x44
0x0x40
0x9ec71c5c: netbsd:vn_lock+0x88
0x9ec71cac: netbsd:lookup_once+0x224
0x9ec71d7c: netbsd:namei_tryemulroot+0x528
0x9ec71db4: netbsd:namei+0x34
0x9ec71ddc: netbsd:fd_nameiat.isra.0+0x64
0x9ec71e4c: netbsd:do_sys_statat+0x84
0x9ec71f04: netbsd:sys___stat50+0x2c
0x9ec71f7c: netbsd:syscall+0xb8
0x9ec71fac: netbsd:swi_handler+0xa0
db{3}> bt/a 0x0000000096f831a0
trace: pid 28575 lid 1 at 0xa1f57aa4
0xa1f57aa4: netbsd:mi_switch+0x10
0xa1f57ad4: netbsd:sleepq_block+0xb4
0xa1f57b14: netbsd:turnstile_block+0x318
0xa1f57b8c: netbsd:rw_enter+0x3c0
0xa1f57bbc: netbsd:genfs_lock+0x68
0xa1f57be4: netbsd:VOP_LOCK+0x40
0xa1f57c0c: netbsd:layer_lock+0x44
0xa1f57c34: netbsd:VOP_LOCK+0x40
0xa1f57c5c: netbsd:vn_lock+0x88
0xa1f57cac: netbsd:lookup_once+0x224
0xa1f57d7c: netbsd:namei_tryemulroot+0x528
0xa1f57db4: netbsd:namei+0x34
0xa1f57ddc: netbsd:fd_nameiat.isra.0+0x64
0xa1f57e4c: netbsd:do_sys_statat+0x84
0xa1f57f04: netbsd:sys___stat50+0x2c
0xa1f57f7c: netbsd:syscall+0xb8
0xa1f57fac: netbsd:swi_handler+0xa0
db{3}> bt/a 96f83460
trace: pid 29934 lid 1 at 0x9e277a0c
0x9e277a0c: netbsd:mi_switch+0x10
0x9e277a3c: netbsd:sleepq_block+0xb4
0x9e277a7c: netbsd:turnstile_block+0x318
0x9e277af4: netbsd:rw_enter+0x3c0
0x9e277b24: netbsd:genfs_lock+0x68
0x9e277b4c: netbsd:VOP_LOCK+0x40
0x9e277b74: netbsd:layer_lock+0x44
0x9e277b9c: netbsd:VOP_LOCK+0x68
0x9e277bc4: netbsd:vn_lock+0x88
0x9e277bdc: netbsd:layerfs_root+0x38
0x9e277bfc: netbsd:VFS_ROOT+0x30
0x9e277c4c: netbsd:lookup_once+0x29c
0x9e277d1c: netbsd:namei_tryemulroot+0x528
0x9e277d54: netbsd:namei+0x34
0x9e277e2c: netbsd:vn_open+0x94
0x9e277eac: netbsd:do_open+0xb0
0x9e277edc: netbsd:do_sys_openat+0x7c
0x9e277f04: netbsd:sys_open+0x38
0x9e277f7c: netbsd:syscall+0xb8
0x9e277fac: netbsd:swi_handler+0xa0
db{3}> bt/a 9357ce20
trace: pid 23822 lid 1 at 0x9ce49a6c
0x9ce49a6c: netbsd:mi_switch+0x10
0x9ce49a9c: netbsd:sleepq_block+0xb4
0x9ce49adc: netbsd:turnstile_block+0x318
0x9ce49b54: netbsd:rw_enter+0x3c0
0x9ce49b84: netbsd:genfs_lock+0x68
00x40
0x9ce49bd4: netbsd:layer_lock+0x44
0x9ce49bfc: netbsd:VOP_LOCK+0x68
0x9ce49c24: netbsd:vn_lock+0x88
0x9ce49c3c: netbsd:layerfs_root+0x38
0x9ce49c5c: netbsd:VFS_ROOT+0x30
0x9ce49cac: netbsd:lookup_once+0x29c
0x9ce49d7c: netbsd:namei_tryemulroot+0x528
0x9ce49db4: netbsd:namei+0x34
0x9ce49ddc: netbsd:fd_nameiat.isra.0+0x64
0x9ce49e4c: netbsd:do_sys_statat+0x84
0x9ce49f04: netbsd:sys___stat50+0x2c
0x9ce49f7c: netbsd:syscall+0xb8
0x9ce49fac: netbsd:swi_handler+0xa0
db{3}> bt/a 92fec960
trace: pid 2319 lid 1 at 0x9d483a0c
0x9d483a0c: netbsd:mi_switch+0x10
0x9d483a3c: netbsd:sleepq_block+0xb4
0x9d483a7c: netbsd:turnstile_block+0x318
0x9d483af4: netbsd:rw_enter+0x3c0
0x9d483b24: netbsd:genfs_lock+0x68
0x9d483b4c: netbsd:VOP_LOCK+0x40
0x9d483b74: netbsd:layer_lock+0x44
0x9d483b9c: netbsd:VOP_LOCK+0x68
0x9d483bc4: netbsd:vn_lock+0x88
0x9d483bdc: netbsd:layerfs_root+0x38
0x9d483bfc: netbsd:VFS_ROOT+0x30
0x9d483c4c: netbsd:lookup_once+0x29c
0x9d483d1c: netbsd:namei_tryemulroot+0x528
0x9d483d54: netbsd:namei+0x34
0x9d483e2c: netbsd:vn_open+0x94
0x9d483eac: netbsd:do_open+0xb0
0x9d483edc: netbsd:do_sys_openat+0x7c
0x9d483f04: netbsd:sys_open+0x38
0x9d483f7c: netbsd:syscall+0xb8
0x9d483fac: netbsd:swi_handler+0xa0
db{3}> bt/a 91c733e0
trace: pid 0 lid 67 at 0x9aaa9d64
0x9aaa9d64: netbsd:mi_switch+0x10
0x9aaa9d94: netbsd:sleepq_block+0xb4
0x9aaa9dd4: netbsd:turnstile_block+0x318
0x9aaa9e4c: netbsd:rw_enter+0x3c0
0x9aaa9e7c: netbsd:genfs_lock+0x68
0x9aaa9ea4: netbsd:VOP_LOCK+0x40
0x9aaa9ecc: netbsd:vn_lock+0x88
0x9aaa9f2c: netbsd:ffs_sync+0xb0
0x9aaa9f4c: netbsd:VFS_SYNC+0x30
0x9aaa9fac: netbsd:sched_sync+0x27c
db{3}> bt/a 91596840
trace: pid 0 lid 9 at 0x9a825d74
0x9a825d74: netbsd:mi_switch+0x10
0x9a825da4: netbsd:sleepq_block+0xb4
0x9a825de4: netbsd:turnstile_block+0x318
0x9a825e5c: netbsd:rw_enter+0x3c0
0x9a825e8c: netbsd:genfs_lock+0x68
0x9a825eb4: netbsd:VOP_LOCK+0x40
0x9a825edc: netbsd:layer_lock+0x44
0x9a825f04: netbsd:VOP_LOCK+0x40
0x9a825f2c: netbsd:vn_lock+0x88
0x9a825f5c: netbsd:vclean+d:cleanvnode+0xf4
0x9a825fac: netbsd:vdrain_thread+0x68
db{3}>
>How-To-Repeat:
Build pbulk packages on top of layerfs
>Fix:
Home |
Main Index |
Thread Index |
Old Index