Subject: parallel make locking up (on amd64)
To: None <current-users@netbsd.org>
From: Kurt Schreiner <ks@ub.uni-mainz.de>
List: port-amd64
Date: 07/12/2006 16:49:44
Hi,
"torturing" my shiny new Sun ultra40 I tried some "build.sh -j7" which run for
a while but eventually the make processes lock up WAITing on vnlock.
The lockup can (more or less) be reproduced by "reboot; login; build.sh -j7"...
Filesystems are setup as follows:
/dev/wd1g on /u type ffs (noatime, soft dependencies, local)
mfs:698 on /tmp type mfs (synchronous, nosuid, nodev, noatime, local)
<above>:/u/NetBSD/lsrc on /u/NetBSD/src.060711 type union (nosuid, nodev, local, mounted by ks)
parameters to build.sh are:
./build.sh -N 1 -j 7 -x -U -m amd64 -O /u/NetBSD/arch/amd64/obj \
-D /u/NetBSD/arch/amd64/dest -T /u/NetBSD/arch/amd64/TOOLS
DDB (on serial console ;-) shows:
db{0}> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
8675 1 8675 0 2 0x4002 1 getty ttyin
3366 3573 7364 77 2 0x4002 1 less ttyin
3573 7364 7364 77 2 0x4002 1 sh wait
7364 3355 7364 77 2 0x4002 1 man wait
12189 1 7611 77 2 0x4002 1 nbmake vnlock
7935 1 7217 77 2 0x4002 1 nbmake vnlock
5841 1 3746 77 2 0x4002 1 nbmake vnlock
4683 1 3379 77 2 0x4002 1 nbmake vnlock
3095 1 1997 77 2 0x4002 1 nbmake vnlock
7294 1 5283 77 2 0x4002 1 nbmake vnlock
3355 2895 3355 77 2 0x4002 1 tcsh pause
2895 3517 3517 77 2 0x100 1 sshd select
3517 636 3517 0 2 0x4101 1 sshd netio
1207 918 1207 77 2 0x4002 1 tcsh ttyin
918 244 244 77 2 0x100 1 sshd select
244 636 244 0 2 0x4101 1 sshd netio
243 1 243 0 2 0x4002 1 getty ttyin
242 1 242 0 2 0x4002 1 getty ttyin
241 1 241 0 2 0x4002 1 getty ttyin
235 1 235 0 2 0 1 cron nanosle
233 1 233 0 2 0 1 inetd kqread
db{0}> trace/t 0t5841
trace: pid 5841 at 0xffff800057a326a0
ltsleep() at netbsd:ltsleep+0x3df
acquire() at netbsd:acquire+0x17d
lockmgr() at netbsd:lockmgr+0x367
VOP_LOCK() at netbsd:VOP_LOCK+0x25
vn_lock() at netbsd:vn_lock+0x99
cache_lookup() at netbsd:cache_lookup+0x2f9
ufs_lookup() at netbsd:ufs_lookup+0xdc
VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x27
union_lookup1() at netbsd:union_lookup1+0x42
union_lookup() at netbsd:union_lookup+0xd9
VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x27
lookup() at netbsd:lookup+0x296
namei() at netbsd:namei+0x16a
vn_open() at netbsd:vn_open+0x164
sys_open() at netbsd:sys_open+0xdd
syscall_plain() at netbsd:syscall_plain+0x122
kernel: page fault trap, code=0
Faulted in DDB; continuing...
db{0}> trace/t 0t7294
trace: pid 7294 at 0xffff8000581b9b60
ltsleep() at netbsd:ltsleep+0x3df
acquire() at netbsd:acquire+0x17d
lockmgr() at netbsd:lockmgr+0x680
VOP_LOCK() at netbsd:VOP_LOCK+0x25
vn_lock() at netbsd:vn_lock+0x99
union_lock() at netbsd:union_lock+0x7f
VOP_LOCK() at netbsd:VOP_LOCK+0x25
vn_lock() at netbsd:vn_lock+0x99
vn_readdir() at netbsd:vn_readdir+0xcb
sys___getdents30() at netbsd:sys___getdents30+0xaa
syscall_plain() at netbsd:syscall_plain+0x122
kernel: page fault trap, code=0
Faulted in DDB; continuing...
Is there anything I can do to help debugging this? Sendpr?
Kurt