Subject: port-powerpc/16069: [dM] locking lossage
To: None <gnats-bugs@gnats.netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: netbsd-bugs
Date: 03/26/2002 11:16:26
>Number: 16069
>Category: port-powerpc
>Synopsis: [dM] locking lossage
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-powerpc-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Mar 26 08:18:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator: der Mouse
>Release: Proprietary PPC port derived from 1.5W
>Organization:
Dis-
>Environment:
Proprietary PPC port derived from 1.5W
>Description:
I've been asked to post this by a company that's been working
with NetBSD. I don't know much about the problem beyond what's
here, as I did not see this myself, but I can get mail back to
the actual originators. (I realize this may seem to be a
rather disorganized collection of information; it's what they
sent me, and as I say, I didn't see it happen myself. I'm not
entirely sure why they asked me to post it instead of doing it
themselves. I'm trying to help them some in tracking it down
myself; if anyone has any ideas, I/we would most appreciate
hearing them.)
Symptoms:
1) kernel panic as a result of DSI trap
2) gdb locked on a "vmmaplock" channel, kernel deadlock, lost console , can only go to db
kannot kill deadlocked processes from within db
3) gdb locked on a "uvn_fp2" channel, kernel deadlock, lost console , can only go to db
kannot kill deadlocked processes from within db
Proprietary NetBSD PPC port derived from the 1.5W
LOCKDEBUG not defined, not a SMP config.
/* $NetBSD: param.h,v 1.128 2001/06/03 02:48:45 thorpej Exp $ */
#define __NetBSD_Version__ 105230000 /* NetBSD 1.5W */
/* $NetBSD: sys_process.c,v 1.67 2001/03/17 09:38:36 pooka Exp $ */
/* $NetBSD: kern_synch.c,v 1.104 2001/05/28 22:20:03 chs Exp $ */
/* $NetBSD: kern_lock.c,v 1.55 2001/06/05 04:38:09 thorpej Exp $ */
/* $NetBSD: uvm_fault_i.h,v 1.13 2001/06/02 18:09:26 chs Exp $ */
/* $NetBSD: uvm_map.h,v 1.28 2001/06/02 18:09:27 chs Exp $ */
/* $NetBSD: uvm_fault.c,v 1.64 2001/06/02 18:09:26 chs Exp $ */
/* $NetBSD: uvm_map.c,v 1.99 2001/06/02 18:09:26 chs Exp $ */
/* $NetBSD: uvm_io.c,v 1.15 2001/06/02 18:09:26 chs Exp $ */
/* $NetBSD: uvm_vnode.c,v 1.50 2001/05/26 21:27:21 chs Exp $ */
/* $NetBSD: procfs_mem.c,v 1.27 2000/11/24 18:58:37 chs Exp $ */
/* $NetBSD: layer_vnops.c,v 1.6 2001/06/07 13:32:47 wiz Exp $ */
In platform-dependent part: arch/my_ppc
/* $NetBSD: cpu.c,v 1.1 2000/02/29 15:21:46 nonaka Exp $ */
/* $NetBSD: locore.s,v 1.8 2000/11/16 05:38:33 thorpej Exp $ */
/* $NetBSD: machdep.c,v 1.11 2000/09/13 15:00:22 thorpej Exp $ */
in arch/powerpc
/* $NetBSD: Locore.c,v 1.4 2000/06/08 06:48:45 kleink Exp $ */
/* $NetBSD: locore_subr.S,v 1.2 2001/02/28 20:44:41 tsubai Exp $ */
/* $NetBSD: mem.c,v 1.9 2001/02/04 17:38:11 briggs Exp $ */
/* $NetBSD: pmap.c,v 1.44 2001/06/10 11:01:27 tsubai Exp $ */
/* $NetBSD: powerpc_machdep.c,v 1.4 2001/04/05 09:58:05 tsubai Exp $ */
/* $NetBSD: process_machdep.c,v 1.5 2001/02/04 17:38:11 briggs Exp $ */
/* $NetBSD: sys_machdep.c,v 1.3 2000/06/09 14:08:45 kleink Exp $ */
/* $NetBSD: trap.c,v 1.46 2001/06/10 16:31:59 tsubai Exp $ */
/* $NetBSD: trap_subr.S,v 1.6 2001/06/08 00:16:25 matt Exp $ */
/* $NetBSD: trap_subr_mp.S,v 1.2 2001/06/10 11:09:28 tsubai Exp $ */
/* $NetBSD: vm_machdep.c,v 1.28 2001/06/10 11:01:28 tsubai Exp $ */
bash-2.05# gdb my_shlib_test
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc--netbsd"...
(gdb) r -N
Starting program: /usr/local/bin/my_shlib_test -N
trap: kernel read DSI @ 0xe0895c78 by 0x185330 (DSISR 0x40000000)
Stopped in pid 89 (my_shlib_test) at intrctlr_setpl+0x54: lwz r10, r11, 0x44,
db>ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
>How-To-Repeat:
Unknown.
>Fix:
Unknown.
>Release-Note:
>Audit-Trail:
>Unformatted:
>89 88 89 0 7 0x5806 my_shlib_test
88 87 88 0 3 0x4086 gdb wait
87 85 87 0 3 0x4086 bash wait
85 1 85 0 3 0x4086 crunch wait
84 1 84 0 3 0x84 crunch select
77 1 77 0 3 0x84 crunch select
52 1 52 0 3 0x84 crunch mfsidl
7 0 0 0 3 0x20204 aiodoned aiodone
6 0 0 0 3 0x20204 ioflush syncer
5 0 0 0 3 0x20204 reaper reaper
4 0 0 0 3 0x20204 pagedaemon pgdaemo
1 0 1 0 3 0x4084 crunch wait
0 -1 0 0 3 0x20204 swapper schedul
db> show reg
r0 0xe0895c34
r1 0x1f92b0 ddbstk+0x1950
r2 0
r3 0x20f618 extctlr
r4 0xb
r5 0x53 isisize+0x7
r6 0
r7 0x1f9840 ddbstk+0x1ee0
r8 0xded4 comcnputc
r9 0x1f0000 intstk+0xff0
r10 0x185530 intrctlr_mask
r11 0xe0895c34
r12 0x44000000
r13 0
r14 0
r15 0
r16 0
r17 0
r18 0
r19 0
r20 0
r21 0xffffec14
r22 0xffffec08
r23 0x219450d0
r24 0xffffec8c
r25 0xffffec64
r26 0xffffec6c
r27 0
r28 0x44000000
r29 0x1f9868 ddbstk+0x1f08
r30 0x700 tlbdsmsize+0x618
r31 0x1f92b0 ddbstk+0x1950
iar 0x185330 intrctlr_setpl+0x54
msr 0x1030 tlbdsmsize+0xf48
intrctlr_setpl+0x54: lwz r10, r11, 0x44,
db>ps /a
PID COMMAND STRUCT PROC * UAREA * VMSPACE/VM_MAP
>89 my_shlib_test 0x3542ab8 0xefb90000 0x32ed640
db> examine 0x3542ab8+0x1b8 (->pcb)
0x3542c70: efb90000 (i.e. pcb)
db> exam efb90000,4 (pcb)
0xefb90000: e097a480 32fa480 efb93a10 a
SP SPL
db> exam efb93a10
0xefb93a10: efb93a30
db> exam efb93a30,6
0xefb93a30: efb93a50 previous R01
185cf4
0 R30
8000 R31
0 R01 (real R01 not saved?)
224758 LR (illegal instr. here?)
trap: kernel read DSI @ 0xe0895c78 by 0x185330 (DSISR 0x40000000)
i.e. The translation of an attempted access to 0xe0895c78 is not found in the primary
hash table entry group (HTEG), or in the rehashed secondary HTEG, or in the range
of a DBAT register (page fault condition); However
db> exam 0xe0895c78
0xe0895c78: 19c4a0
0xefb93a50: efb93a80 1679b8 uvm_pagelookup(uvm_page_i.h#143 after splx(s))
0xefb93a80: efb93ad0 17669c uvn_findpage(uvm_vnode.c#909 after uvm_pagelookup)
0xefb93ad0: efb93b10 1765c0 uvn_findpages(uvm_vnode.c#886)
0xefb93b10: efb93c70 a2404 genfs_getpages(genfs_vnops.c#517)
0xefb93c70: efb93cb0 185640
0xefb93cb0: efb93d00 1859b8
0xefb93d00: efb93d20 159a94 uvmfault_unlockmaps(uvm_fault_i.h#73)
0xefb93d20: efb93d40 1599d8 ufmfault_unlockall((uvm_fault_i.h#96)
0xefb93d40: efb93eb0 158fc0 uvm_fault(uvm_fault.c#1778)
0xefb93eb0: efb93f50 17d358 trap (trap.c#187 case EXC_ISI|EXC_USER )
0xefb93f50: ffffeb50 578c after trapexit
??? at some point the map was invalid
bash-2.05# gdb my_shlib_test
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc--netbsd"...
(gdb) shell sysctl -w proc.88.rlimit.datasize.soft=unlimited
proc.88.rlimit.datasize.soft: 33554432 -> unlimited
(gdb) shell sysctl -w proc.88.rlimit.stacksize.soft=unlimited
proc.88.rlimit.stacksize.soft: 1048576 -> unlimited
(gdb) shell sysctl -w proc.88.rlimit.stacksize.hard=unlimited
proc.88.rlimit.stacksize.hard: 33554432 -> unlimited
(gdb) r -N
Starting program: /usr/local/bin/my_shlib_test -N
Stopped at cpu_Debugger+0x18: lwz r11, r1, 0x0,
db> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
99 88 99 0 4 0x5806 my_shlib_test
88 87 88 0 3 0x4006 gdb vmmaplk
87 85 87 0 3 0x4086 bash wait
85 1 85 0 3 0x4086 crunch wait
84 1 84 0 3 0x84 crunch select
77 1 77 0 3 0x84 crunch select
52 1 52 0 3 0x84 crunch mfsidl
7 0 0 0 3 0x20204 aiodoned aiodone
6 0 0 0 3 0x20204 ioflush syncer
5 0 0 0 3 0x20204 reaper reaper
4 0 0 0 3 0x20204 pagedaemon pgdaemo
1 0 1 0 3 0x4084 crunch wait
0 -1 0 0 3 0x20204 swapper schedul
db> ps /w
PID COMMAND EMUL PRI UTIME STIME WAIT-MSG WAIT-CHANNEL
99 my_shlib_test netbsd 51 0.2 0.3
88 gdb netbsd 4 0.3 0.8 vmmaplk kernel_map_store+0x4
87 bash netbsd 32 0.0 0.2 wait 0x3542008
85 crunch netbsd 32 0.0 0.3 wait 0x32e8c78
84 crunch netbsd 24 0.0 0.0 select selwait
77 crunch netbsd 24 0.4 0.0 select selwait
52 crunch netbsd 32 0.0 0.2 mfsidl 0x41c9900
7 aiodoned netbsd 4 0.0 0.0 aiodoned uvm+0x34
6 ioflush netbsd 40 0.0 0.0 syncer rushjob
5 reaper netbsd 4 0.0 2.0 reaper deadproc
4 pagedaemon netbsd 4 0.0 0.0 pgdaemon uvm+0x28
1 crunch netbsd 32 0.0 0.1 wait 0x32e8000
0 swapper netbsd 4 0.0 0.0 scheduler proc0
db> print kernel_map_store+0x4
1fa09c
db> ps /a
PID COMMAND STRUCT PROC * UAREA * VMSPACE/VM_MAP
99 my_shlib_test 0x3542728 0xefb8f000 0x32ed258
88 gdb 0x3542ab8 0xefb8b000 0x32ed640
87 bash 0x3542008 0xefb7a000 0x32ed190
85 crunch 0x32e8c78 0xefb76000 0x32ed0c8
84 crunch 0x3542560 0xefb87000 0x32ed3e8
77 crunch 0x35428f0 0xefb83000 0x32ed578
52 crunch 0x3542398 0xefb7f000 0x32ed320
7 aiodoned 0x32e8ab0 0xefb71000 0x212c90
6 ioflush 0x32e88e8 0xefb6d000 0x212c90
5 reaper 0x32e8720 0xefb69000 0x212c90
4 pagedaemon 0x32e8558 0xefb65000 0x212c90
1 crunch 0x32e8000 0xefb59000 0x32ed000
0 swapper 0x212d58 0x266000 0x212c90
db> show
all buf object registers watches
arptab map page uvmexp
breaks ncache pool vnode
db> show map
MAP 0x1fa098: [0xe0000000->0xf0000000]
#ent=14, sz=263806976, ref=1, version=4563, flags=0x1
pmap=0x22470c(resident=12166)
---------------------------
0x1fa098 is & of the kernel's vm_map structure.
The wait channel address is 1fa09c i.e. &vm_map::lock::lk_interlock
can do a:
db> call wakeup(0x1fa09c)
db> ps
db> cont
It sleeps in lockmgr (kern_lock.c#686) - there is a ltsleep there
682 lkp->lk_flags |= LK_WANT_EXCL;
683 /*
684 * Wait for shared locks and upgrades to finish.
685 */
686 >>> ACQUIRE(lkp, error, extflags, 0, lkp->lk_sharecount != 0 ||
687 (lkp->lk_flags & LK_WANT_UPGRADE));
688 lkp->lk_flags &= ~LK_WANT_EXCL;
689 if (error)
Another deadlock behavior:
bash-2.05# gdb my_shlib_test
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc--netbsd"...
(gdb) r -N
Starting program: /usr/local/bin/my_shlib_test -N
Stopped at cpu_Debugger+0x18: lwz r11, r1, 0x0,
db> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
89 88 89 0 4 0x5806 my_shlib_test
88 87 88 0 3 0x4006 gdb uvn_fp2
87 85 87 0 3 0x4086 bash wait
85 1 85 0 3 0x4086 crunch wait
84 1 84 0 3 0x84 crunch select
77 1 77 0 3 0x84 crunch select
52 1 52 0 3 0x84 crunch mfsidl
7 0 0 0 3 0x20204 aiodoned aiodone
6 0 0 0 3 0x20204 ioflush syncer
5 0 0 0 3 0x20204 reaper reaper
4 0 0 0 3 0x20204 pagedaemon pgdaemo
1 0 1 0 3 0x4084 crunch wait
0 -1 0 0 3 0x20204 swapper schedul
db> cont
stuck in uvn_findpage (uvm_vnode.c#946)