Subject: Re: parallel make locking up (on amd64)
To: None <current-users@netbsd.org>
From: Christos Zoulas <christos@astron.com>
List: current-users
Date: 07/12/2006 19:53:13
In article <20060712144944.GE16050@sb1001.name>,
Kurt Schreiner <ks@ub.uni-mainz.de> wrote:
>Hi,
>
>"torturing" my shiny new Sun ultra40 I tried some "build.sh -j7" which run for
>a while but eventually the make processes lock up WAITing on vnlock.
>The lockup can (more or less) be reproduced by "reboot; login; build.sh -j7"...
>Filesystems are setup as follows:
>
>/dev/wd1g on /u type ffs (noatime, soft dependencies, local)
>mfs:698 on /tmp type mfs (synchronous, nosuid, nodev, noatime, local)
><above>:/u/NetBSD/lsrc on /u/NetBSD/src.060711 type union (nosuid,
>nodev, local, mounted by ks)
>
>parameters to build.sh are:
>
>./build.sh -N 1 -j 7 -x -U -m amd64 -O /u/NetBSD/arch/amd64/obj \
> -D /u/NetBSD/arch/amd64/dest -T /u/NetBSD/arch/amd64/TOOLS
>
>DDB (on serial console ;-) shows:
>
>db{0}> ps
> PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
> 8675 1 8675 0 2 0x4002 1 getty ttyin
> 3366 3573 7364 77 2 0x4002 1 less ttyin
> 3573 7364 7364 77 2 0x4002 1 sh wait
> 7364 3355 7364 77 2 0x4002 1 man wait
> 12189 1 7611 77 2 0x4002 1 nbmake vnlock
> 7935 1 7217 77 2 0x4002 1 nbmake vnlock
> 5841 1 3746 77 2 0x4002 1 nbmake vnlock
> 4683 1 3379 77 2 0x4002 1 nbmake vnlock
> 3095 1 1997 77 2 0x4002 1 nbmake vnlock
> 7294 1 5283 77 2 0x4002 1 nbmake vnlock
> 3355 2895 3355 77 2 0x4002 1 tcsh pause
> 2895 3517 3517 77 2 0x100 1 sshd select
> 3517 636 3517 0 2 0x4101 1 sshd netio
> 1207 918 1207 77 2 0x4002 1 tcsh ttyin
> 918 244 244 77 2 0x100 1 sshd select
> 244 636 244 0 2 0x4101 1 sshd netio
> 243 1 243 0 2 0x4002 1 getty ttyin
> 242 1 242 0 2 0x4002 1 getty ttyin
> 241 1 241 0 2 0x4002 1 getty ttyin
> 235 1 235 0 2 0 1 cron nanosle
> 233 1 233 0 2 0 1 inetd kqread
>
>
>db{0}> trace/t 0t5841
>trace: pid 5841 at 0xffff800057a326a0
>ltsleep() at netbsd:ltsleep+0x3df
>acquire() at netbsd:acquire+0x17d
>lockmgr() at netbsd:lockmgr+0x367
>VOP_LOCK() at netbsd:VOP_LOCK+0x25
>vn_lock() at netbsd:vn_lock+0x99
>cache_lookup() at netbsd:cache_lookup+0x2f9
>ufs_lookup() at netbsd:ufs_lookup+0xdc
>VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x27
>union_lookup1() at netbsd:union_lookup1+0x42
>union_lookup() at netbsd:union_lookup+0xd9
>VOP_LOOKUP() at netbsd:VOP_LOOKUP+0x27
>lookup() at netbsd:lookup+0x296
>namei() at netbsd:namei+0x16a
>vn_open() at netbsd:vn_open+0x164
>sys_open() at netbsd:sys_open+0xdd
>syscall_plain() at netbsd:syscall_plain+0x122
>kernel: page fault trap, code=0
>Faulted in DDB; continuing...
>
>db{0}> trace/t 0t7294
>trace: pid 7294 at 0xffff8000581b9b60
>ltsleep() at netbsd:ltsleep+0x3df
>acquire() at netbsd:acquire+0x17d
>lockmgr() at netbsd:lockmgr+0x680
>VOP_LOCK() at netbsd:VOP_LOCK+0x25
>vn_lock() at netbsd:vn_lock+0x99
>union_lock() at netbsd:union_lock+0x7f
>VOP_LOCK() at netbsd:VOP_LOCK+0x25
>vn_lock() at netbsd:vn_lock+0x99
>vn_readdir() at netbsd:vn_readdir+0xcb
>sys___getdents30() at netbsd:sys___getdents30+0xaa
>syscall_plain() at netbsd:syscall_plain+0x122
>kernel: page fault trap, code=0
>Faulted in DDB; continuing...
>
>Is there anything I can do to help debugging this? Sendpr?
Yes, try without using sofdeps.
christos