Subject: Re: parallel make locking up (on amd64)
To: Martin Husemann <martin@duskware.de>
From: Kurt Schreiner <ks@ub.uni-mainz.de>
List: port-amd64
Date: 07/13/2006 13:59:25
On Wed, Jul 12, 2006 at 09:35:50PM +0200, Martin Husemann wrote:
> On Wed, Jul 12, 2006 at 04:49:44PM +0200, Kurt Schreiner wrote:
> > Is there anything I can do to help debugging this? Sendpr?
> 
> Often a kernel with options LOCKDEBUG helps to debug this kind of problems.
Hm, I'm running a kernel w/ "options LOCKDEBUG", what I can do now?
The make's are hung again, but I get this hangs only when a union mount is
part of the game - softdeps or not doesn't make a difference, putting in
the union-mounted "local modifications" (which are empty, btw) reliably leads
to hung make's. Hm! So softdeps (what christos suggested) seem not to be the
culprit this time...

Here's the output from ddb:

login: ~Stopped at      netbsd:cpu_Debugger+0x5:        leave
db{0}> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 25375        23247    25375         77 2  0x4002    1             tcsh   ttyin
 23247        15946    15946         77 2   0x100    1             sshd  select
 15946         1085    15946          0 2  0x4101    1             sshd   netio
 684              1      466         77 2  0x4002    1           nbmake  vnlock
 436              1     3830         77 2  0x4002    1           nbmake  vnlock
 312           1143      312         77 2  0x4002    1             tcsh   ttyin
 1143           245      245         77 2   0x100    1             sshd  select
 245           1085      245          0 2  0x4101    1             sshd   netio
 244              1      244          0 2  0x4002    1            getty   ttyin
 243              1      243          0 2  0x4002    1            getty   ttyin
 242              1      242          0 2  0x4002    1            getty   ttyin
 241              1      241          0 2  0x4002    1            getty   ttyin
 236              1      236          0 2       0    1             cron nanosle
 234              1      234          0 2       0    1            inetd  kqread
 1085             1     1085          0 2       0    1             sshd  select
 166              1      166         15 2   0x100    1             ntpd   pause
 98             925      925          0 2       0    1             nfsd    nfsd
 97             925      925          0 2       0    1             nfsd    nfsd
 96             925      925          0 2       0    1             nfsd    nfsd
 919            925      925          0 2       0    1             nfsd    nfsd
 925              1      925          0 2       0    1             nfsd    poll
db{0}> bt/t 0t436
trace: pid 436  at 0xffff80006310e9f0
ltsleep() at netbsd:ltsleep+0x42a
acquire() at netbsd:acquire+0x25c
_lockmgr() at netbsd:_lockmgr+0xb05
VOP_LOCK() at netbsd:VOP_LOCK+0x28
vn_lock() at netbsd:vn_lock+0x97
union_lock() at netbsd:union_lock+0x7f
VOP_LOCK() at netbsd:VOP_LOCK+0x28
vn_lock() at netbsd:vn_lock+0x97
union_dircache() at netbsd:union_dircache+0x22
union_readdirhook() at netbsd:union_readdirhook+0x61
vn_readdir() at netbsd:vn_readdir+0x13b
sys___getdents30() at netbsd:sys___getdents30+0xec
syscall_plain() at netbsd:syscall_plain+0x122
kernel: page fault trap, code=0
Faulted in DDB; continuing...
db{0}> bt/t 0t684
trace: pid 684  at 0xffff800063152b50
ltsleep() at netbsd:ltsleep+0x42a
acquire() at netbsd:acquire+0x25c
_lockmgr() at netbsd:_lockmgr+0x8d7
VOP_LOCK() at netbsd:VOP_LOCK+0x28
vn_lock() at netbsd:vn_lock+0x97
union_lock() at netbsd:union_lock+0x7f
VOP_LOCK() at netbsd:VOP_LOCK+0x28
vn_lock() at netbsd:vn_lock+0x97
vn_readdir() at netbsd:vn_readdir+0xcb
sys___getdents30() at netbsd:sys___getdents30+0xec
syscall_plain() at netbsd:syscall_plain+0x122
kernel: page fault trap, code=0
Faulted in DDB; continuing...
db{0}> 

Kurt