Subject: kern/22080: switching with held simple_lock
To: None <gnats-bugs@gnats.netbsd.org>
From: None <mhitch@lightning.msu.montana.edu>
List: netbsd-bugs
Date: 07/06/2003 21:54:55
>Number: 22080
>Category: kern
>Synopsis: switching with held simple_lock
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jul 07 03:55:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Michael L. Hitch
>Release: NetBSD 1.6.1
>Organization:
Michael L. Hitch mhitch@montana.edu
Operations Consulting, Information Technology Center
Montana State University, Bozeman, MT USA
>Environment:
System: NetBSD netbsd2.msu.montana.edu 1.6.1 NetBSD 1.6.1 (CS20.MP) #5: Sat Jun 28 21:49:35 MDT 2003 mhitch@netbsd2.msu.montana.edu:/usr/NetBSD-1.6.1/obj/alphaev56/.alpha/sys/arch/alpha/compile/CS20.MP alpha
Architecture: alpha
Machine: alpha
>Description:
While running a LOCKDEBUG kernel on an API CS20, the following console was displayed
about "switching with held simple_lock". This had previously been reported in
PR 11190, and closed. I was able to get a stack trace in DDB, but was unable to
get a dump file to get a more detailed analysis. This happened during the daily
run. From the stack trace, it looks like checkalias() has obtained a lock on
spechasl_slock, then called vget(), which then tried to get another lock and
called ltsleep and tried to switch processes. An attempt to continue resulted
in a trap, and a further attempt to get a crash dump hung the system.
switching with held simple_lock 0xfffffc00005a3708 CPU 0 /usr/NetBSD-1.6.1/src/sys/kern/vfs_subr.c:1051
simple_lock: locking against myself
lock: 0xfffffc00005a2af8, currently at: /usr/NetBSD-1.6.1/src/sys/kern/sys_generic.c:961
on cpu 0
last locked: /usr/NetBSD-1.6.1/src/sys/kern/kern_synch.c:400
last unlocked: /usr/NetBSD-1.6.1/src/sys/kern/kern_synch.c:594
alpha trace requires known PC =eject=
Stopped in pid 23463 (find) at cpu_Debugger+0x4: ret zero,(ra)
db{0}> t
cpu_Debugger() at cpu_Debugger+0x4
_simple_lock() at _simple_lock+0x130
selwakeup() at selwakeup+0xc0
logwakeup() at logwakeup+0x40
vprintf() at vprintf+0xe0
lock_printf() at lock_printf+0xa0
simple_lock_only_held() at simple_lock_only_held+0x158
simple_lock_switchcheck() at simple_lock_switchcheck+0x20
mi_switch() at mi_switch+0xbc
ltsleep() at ltsleep+0x49c
_lockmgr() at _lockmgr+0x11d8
genfs_lock() at genfs_lock+0x30
vn_lock() at vn_lock+0xf8
vget() at vget+0x1dc
checkalias() at checkalias+0x168
ufs_vinit() at ufs_vinit+0x88
ffs_vget() at ffs_vget+0x380
ufs_lookup() at ufs_lookup+0xe58
lookup() at lookup+0x4a4
namei() at namei+0x49c
sys___lstat13() at sys___lstat13+0x5c
syscall_plain() at syscall_plain+0x158
XentSys() at XentSys+0x5c
--- syscall (280) ---
--- user mode ---
db{0}> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
23465 23462 22900 0 3 0x84084 sort piperd
23464 23462 22900 0 3 0x84084 xargs piperd
>How-To-Repeat:
This has only happened once so far.
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
>23463 23462 22900 0 3 0x84004 find vnlock
23462 23284 22900 0 3 0x80084 sh wait
23284 22906 22900 0 3 0x84084 sh wait
23277 272 272 12 3 0x84184 showq select
22913 22911 22900 0 3 0x84184 postdrop netio
22911 22900 22900 0 3 0x84084 sendmail piperd
22909 22900 22900 0 3 0x84084 tee piperd
22906 22900 22900 0 3 0x84084 sh wait
22900 22898 22900 0 3 0x84084 sh wait
22898 284 284 0 3 0x80084 cron piperd
20452 272 272 12 3 0x84184 pickup select
15509 323 323 271 7 0x4007 setiathome
14816 305 305 271 2 0x84007 setiathome
26864 26863 26864 271 3 0x84086 sh ttyin
26863 26862 26862 271 3 0x80086 script ttyin
26862 26861 26862 271 3 0x84086 script ttyin
26861 647 26861 271 3 0x84086 sh wait
17071 17070 17071 271 3 0x84086 sh ttyin
17070 301 17070 271 3 0x84086 sh wait
11435 296 11435 271 3 0x84186 screen-3.9.5 pause
9769 334 9769 271 3 0x84086 sh ttyin
9217 28406 9217 271 3 0x84086 ssh select
688 28406 688 0 4 0x5006 csh
649 647 649 271 3 0x84086 xload select
647 646 647 271 3 0x84086 sh wait
646 279 279 0 3 0x84184 rlogind select
28406 28405 28406 271 3 0x84086 sh wait
28405 279 279 0 3 0x84184 rlogind select
19119 169 169 32767 3 0x80184 httpd lockf
19118 169 169 32767 3 0x80184 httpd lockf
19117 169 169 32767 3 0x80184 httpd lockf
19116 169 169 32767 3 0x80184 httpd lockf
19115 169 169 32767 3 0x80184 httpd select
335 334 335 271 3 0x84086 xload select
334 301 334 271 3 0x84086 sh wait
324 322 324 271 3 0x84086 mon_seti3 nanosle
323 322 323 271 3 0x84086 sh wait
322 301 322 271 3 0x84086 sh wait
306 304 306 271 3 0x84086 mon_seti3 nanosle
305 304 305 271 3 0x84086 sh wait
304 301 304 271 3 0x84086 sh wait
303 302 303 271 3 0x84086 top select
302 301 302 271 3 0x84086 sh wait
301 1 301 271 3 0x80184 screen-3.9.5 select
296 295 296 271 3 0x84086 sh wait
295 293 293 271 3 0x80184 sshd select
293 201 293 0 3 0x80184 sshd netio
289 272 272 12 3 0x84184 qmgr select
287 1 1 0 3 0x84084 getty nanosle
286 1 286 0 3 0x84086 getty ttyin
284 1 284 0 3 0x80084 cron nanosle
279 1 279 0 3 0x80084 inetd select
272 1 272 0 3 0x8418c master select
209 1 209 0 3 0x80084 snmptrapd select
208 169 169 32767 3 0x80184 httpd lockf
207 169 169 32767 3 0x80184 httpd netio
206 169 169 32767 3 0x80184 httpd lockf
205 169 169 32767 3 0x80184 httpd netio
204 169 169 32767 3 0x80184 httpd lockf
201 1 201 0 3 0x80084 sshd select
194 1 194 0 3 0x80084 rwhod netio
184 1 184 0 3 0x80084 ntpd pause
172 1 172 0 3 0x80084 lpd select
169 1 169 0 3 0x80084 httpd select
142 1 142 0 3 0x80084 mountd select
121 1 121 0 3 0x80084 mount_mfs mfsidl
112 1 112 0 3 0x80084 rpcbind select
98 1 98 0 3 0x80084 syslogd select
7 0 0 0 3 0xa0204 aiodoned aiodone
6 0 0 0 3 0xa0204 ioflush biowait
5 0 0 0 3 0x20204 reaper reaper
4 0 0 0 3 0xa0204 pagedaemon pgdaemo
3 0 0 0 3 0xa0204 atapibus0 sccomp
2 0 0 0 3 0xa0204 scsibus0 sccomp
1 0 1 0 3 0x84084 init wait
0 -1 0 0 3 0xa0204 swapper schedul
db{0}> c
alpha trace requires known PC =eject=
S
oCpPpUe d1 :i nf aptiadl 2k3e4r6n3e l( ftirnadp):
t
CcPpUu _1D e b u gtgrearp+ 0exn4t:r y = 0 x 2r e(tm e m o r yz emraon,a(gream)e
td bf{a0u}l>t )
CPU 1 a0 = 0x1b336ce0
CPU 1 a1 = 0x1
CPU 1 a2 = 0xffffffffffffffff
CPU 1 pc = 0x1b336ce0
CPU 1 ra = 0x1b336ce3
CPU 1 pv = 0xfffffc00003c7080
CPU 1 curproc = 0xfffffc0028280bd0
CPU 1 pid = 23463, comm = find
Caught exception in ddb.