tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: netbsd-5 NFS(?) lock up
On Sun, Mar 29, 2009 at 09:49:58PM +0200, Manuel Bouyer wrote:
> Hi,
> trying to upgrade a x86 NFS server from netbsd-3 to netbsd-5 has been
> a fiasco. The kernel looks up within seconds after going multiuser, even
> with SMP disabled in the BIOS (the kernel indeed sees only one CPU).
> LOCKDEBUG doesn't help, the kernel is just dead, all I can do is enter
> ddb on console.
>
> Here's what I've been able to collect so far (this is with hyperthreading
> enabled in BIOS so kernel sees 2 CPUs). Hardware is a Intel X86 with 3Ghz
> Xeon CPU (one of the first EM64T xeons I think), 1G RAM. Disk drives are
> 2 wd(4) behind a piixide and 6 sd(4) behind two esiop(4), raid-1 raidframe on
> all disks. raid-1 parity reconstruct is running when the lockup occurs;
> and I suspect some NFS activity too (maybe several 100s of requests/s).
> There is also samba running, but this one should be almost idle.
I setup a test box with a similar setup (hardware is not 100% identical
unfortunably); and got a LOCKDEBUG panic:
Mutex error: lockdebug_wantlock: locking against myself
lock address : 0x00000000ce88c028 type : sleep/adaptive
initialized : 0x00000000c03a5052
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 11
current cpu : 1 last held: 1
current lwp : 0x00000000ce8edcc0 last held: 0x00000000ce8edcc0
last locked : 0x00000000c03aeff4 unlocked : 0x00000000c025da80
owner field : 0x00000000ce8edcc0 wait/spin: 1/0
Turnstile chain at 0xc0704e60.
=> Turnstile at 0xce955788 (wrq=0xce955798, rdq=0xce9557a0).
=> 0 waiting readers:
=> 10 waiting writers: 0xce943d00 0xce943300 0xce8f2560 0xce8ed7c0 0xce465020
0xce8f22e0 0xce8f2a60 0xce8eda40 0xce8f2ce0 0xce4652a0
panic: LOCKDEBUG
fatal breakpoint trap in supervisor mode
trap type 1 code 0 eip c03f0e2c cs 8 eflags 246 cr2 cdee3000 ilevel 0
Stopped in pid 261.1 (nfsd) at netbsd:breakpoint+0x4: popl %ebp
db{1}> tr
breakpoint(c0641842,ce918728,c2cec800,c035aaaf,0,1,0,0,ce918728,8) at
netbsd:breakpoint+0x4
panic(c0641844,c063d60e,c051629b,c063d5dd,b4a8,18edcc0,0,d08fef18,0,ce88c028)
at netbsd:panic+0x1b0
lockdebug_abort1(c063d5dd,1,0,0,cbf524d0,ce8ede78,0,6,d0821438,ce918b10) at
netbsd:lockdebug_abort1+0xbb
mutex_vector_enter(ce88c028,11,ce918b6c,c025cc47,ce88c000,0,cbf66300,ce44692c,c3b92c00,ce918b58)
at netbsd:mutex_vector_enter+0x464
genfs_renamelock_enter(ce88c000,0,cbf66300,ce44692c,c3b92c00,ce918b58,ce918b54,ce918b44,ce8edcc0,0)
at netbsd:genfs_renamelock_enter+0x14
nfsrv_rename(d0c8ca20,ce44692c,ce8edcc0,ce918bd0,cd117b40,c0701d58,0,c2cec918,c0701d58,0)
at netbsd:nfsrv_rename+0x4b7
nfssvc_nfsd(ce918c38,804a2e0,ce8edcc0,0,0,0,0,0,0,ffffffff) at
netbsd:nfssvc_nfsd+0x3d6
sys_nfssvc(ce8edcc0,ce918d00,ce918d28,bfbff000,ce478684,ce478684,2,4,804a2e0,bfbfee94)
at netbsd:sys_nfssvc+0x332
syscall(ce918d48,b3,ab,bfbf001f,bbbd001f,11,1,bfbfee94,0,bfbffff0) at
netbsd:syscall+0xc8
db{1}> mach cpu 0
using CPU 0
db{1}> tr
__cpu_simple_lock(c2dee000,0,c01002a7,0,c01002a7,0,0,0,0,0) at netbsd:__cpu_simp
le_lock+0xd
db{1}> ps /l
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
1480 1 3 1 84 cea4e000 raidctl nanoslp
443 1 3 1 84 cea03ac0 raidctl nanoslp
1443 1 3 1 84 cea4e280 tcsh pause
1760 1 3 1 84 ce9540a0 ksh pause
567 1 3 1 84 cea4ea00 tcsh pause
570 1 3 1 84 cea4ec80 top select
558 1 3 0 80 cea1d0e0 tcsh pause
556 1 3 1 84 cea1d360 screen-4.0.3 select
559 1 3 1 84 cea1d5e0 screen-4.0.3 pause
289 1 3 0 80 cea1d860 tcsh pause
446 1 3 1 84 cea1dae0 sshd select
465 1 3 0 80 cea1dd60 sshd netio
504 1 3 0 80 cea030c0 getty ttyraw
409 1 3 1 80 cea035c0 getty ttyraw
414 1 3 1 80 ce326860 getty ttyraw
509 1 3 1 84 ce326ae0 getty ttyraw
501 1 3 1 84 cea03340 cron nanoslp
502 1 3 1 84 ce954320 inetd kqueue
397 1 3 0 84 ce465a20 sh wait
490 1 3 1 80 ce943080 sh wait
358 1 3 1 84 cea03d40 smartd nanoslp
319 1 3 1 84 ce9545a0 sendmail pause
435 1 3 0 80 ce954aa0 sshd select
332 1 2 1 1000004 ce954820 ntpd
98 1 3 1 84 ce954d20 rpc.lockd select
285 1 3 1 84 ce4657a0 rpc.statd select
276 1 3 0 4 ce943300 nfsd tstile
270 1 3 0 4 ce943580 nfsd tstile
279 1 3 0 4 ce943800 nfsd tstile
282 1 2 1 4 ce943a80 nfsd
278 1 3 1 4 ce943d00 nfsd tstile
208 1 3 0 4 ce8f2060 nfsd tstile
271 1 3 1 4 ce8f22e0 nfsd tstile
280 1 3 1 4 ce8f2560 nfsd tstile
265 1 3 0 4 ce8f27e0 nfsd tstile
264 1 3 1 4 ce8f2a60 nfsd tstile
277 1 3 1 4 ce8f2ce0 nfsd tstile
266 1 3 0 4 ce8ed040 nfsd tstile
251 1 3 0 4 ce8ed2c0 nfsd tstile
275 1 3 0 4 ce8ed540 nfsd tstile
274 1 3 0 4 ce8ed7c0 nfsd tstile
259 1 3 0 4 ce8eda40 nfsd tstile
261 > 1 7 1 4 ce8edcc0 nfsd
263 1 3 1 4 ce465020 nfsd tstile
249 1 3 0 4 ce4652a0 nfsd tstile
260 1 3 0 4 ce413280 nfsd tstile
252 1 3 0 84 ce413000 nfsd select
237 1 3 1 84 ce465520 mountd select
203 1 3 1 84 ce413780 rpcbind select
159 1 3 0 84 ce413500 syslogd kqueue
134 1 3 0 84 ce3265e0 dhclient select
1 1 3 1 84 cbf76aa0 init wait
0 62 3 0 204 cea03840 raid_parity rfwcond
58 3 1 204 ce465ca0 physiod physiod
> 57 7 0 204 ce413a00 raidio3
56 2 0 204 ce413c80 raid3
55 3 1 204 ce3260e0 raidio2 raidiow
54 3 1 204 ce326360 raid2 rfwcond
53 3 0 204 ce326d60 vmem_rehash
vmem_rehash
52 3 0 204 ce3220c0 aiodoned aiodoned
51 3 0 40204 ce322340 ioflush syncer
50 3 1 204 ce3225c0 pgdaemon pgdaemon
49 3 1 204 ce322840 raidio1 raidiow
48 3 1 204 ce322ac0 raid1 rfwcond
47 3 0 204 ce322d40 raidio0 raidiow
46 3 0 204 cbf760a0 raid0 rfwcond
45 3 0 204 cbf75300 cryptoret crypto_wa
it
42 3 0 204 cbf75080 usb2 usbevt
41 3 1 204 cbf75800 usb3 usbevt
40 3 1 204 cbf75580 usb0 usbevt
39 3 0 204 cbf76320 usbtask-dr usbtsk
38 3 0 204 cbf76d20 usbtask-hc usbtsk
37 3 1 204 cbf76820 usb1 usbevt
36 3 0 204 cbf765a0 unpgc unpgc
27 3 1 204 cbf75a80 iic0 iicintr
26 3 0 204 cbf75d00 atabus3 atath
25 3 1 204 cbf74060 atabus2 atath
24 3 1 204 cbf742e0 atabus1 atath
23 3 0 204 cbf74560 atabus0 atath
22 3 0 204 cbf747e0 scsibus9 sccomp
21 3 1 204 cbf74a60 scsibus8 sccomp
20 3 0 204 cbf74ce0 pms0 pmsreset
19 3 1 204 cbf72040 apm0 apmev
18 3 1 204 cbf722c0 xcall/1 xcall
17 1 1 204 cbf72540 softser/1
16 1 1 204 cbf727c0 softclk/1
15 1 1 204 cbf72a40 softbio/1
14 1 1 204 cbf72cc0 softnet/1
13 1 1 205 cbf6a020 idle/1
12 3 0 204 cbf6a2a0 sysmon smtaskq
11 3 0 204 cbf6a520 pmfevent pmfevent
10 3 0 204 cbf6a7a0 nfssilly nfssilly
9 3 1 204 cbf6aa20 cachegc cachegc
8 3 1 204 cbf6aca0 vrele vrele
7 3 0 204 cbf67000 xcall/0 xcall
6 1 0 204 cbf67280 softser/0
5 1 0 204 cbf67500 softclk/0
4 1 0 204 cbf67780 softbio/0
3 1 0 204 cbf67a00 softnet/0
2 1 0 205 cbf67c80 idle/0
1 3 0 204 c0699ee0 swapper schedule
db{1}> tr/a 0xce413a00
trace: pid 0 lid 57 at 0xce436d2c
The box is still in ddb; anything else I should check ?
--
Manuel Bouyer, LIP6, Universite Paris VI.
Manuel.Bouyer%lip6.fr@localhost
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index