Subject: kern/12348: deadlock like situation in getnewb
To: None <gnats-bugs@gnats.netbsd.org>
From: None <jam@pobox.com>
List: netbsd-bugs
Date: 03/07/2001 10:44:14
>Number: 12348
>Category: kern
>Synopsis: deadlock like situation in getnewb
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Mar 07 08:45:00 PST 2001
>Closed-Date:
>Last-Modified:
>Originator: Kazushi (Jam) Marukawa
>Release: Feb 23, 2001
>Organization:
none
>Environment:
Machine: Celeron 366, Asus P3V4X, and 512MB memory
System: NetBSD sou.nerv.org NetBSD 1.5S (sou) #7: Sat Feb 24 05:21:30 CST 2001 jam@sou.nerv.org:/usr/src/sys/arch/i386/compile/sou i386
Architecture: i386
Machine: i386
>Description:
This time, I experienced some deadlock like situation while
I was writing large block data to the SCSI disk. So, I'm
reporting this. I'm using serial console and it was off
when I got this problem, so I don't have console messages.
First, I noticed that all programs I tried to execute hangs.
I was using screen, so I could switch among screens, but no
more screens nor new programs. For example, if I typed "ls"
nothing happened and ^C, ^\, and ^Z didn't work. Then,
screend also hangs since it called getnewb. However, still
I can ping and send packets through nat on this machine.
I waited about 30 minutes with a hope it would recover.
However, it didn't. So, I sent break through serial console
and got trace, ps, and uvmexp. I had more information, but
I don't know what those information means. So, this time, I
just put some of them at the bottom of this message. If you
have an interest on this and want to see all log or the
result of some specific commands, please ask me so. Or
should I send-pr about this? Thanks.
db> trace
cpu_Debugger(c0f2aa80,e5053560,e5053560,e5092e5c,0) at cpu_Debugger+0x4
comintr(c0f42c00) at comintr+0xcd
Xintr4() at Xintr4+0x70
--- interrupt ---
idle(e5053560) at idle+0x1b
bpendtsleep(e505495c,128,c0486fec,0,0) at bpendtsleep
sigsuspend1(e5053560,e5092f50,e5092f80,0,0) at sigsuspend1+0xf0
sys___sigsuspend14(e5053560,e5092f88,e5092f80) at sys___sigsuspend14+0x36
syscall_plain(1f,bfbf001f,bfbfdc69,0,bfbfdc40) at syscall_plain+0x98
db> c
Stopped at cpu_Debugger+0x4: leave
db> trace
cpu_Debugger(c0f2aa80,e50a4abc,e50a4abc,e50c8da8,0) at cpu_Debugger+0x4
comintr(c0f42c00) at comintr+0xcd
Xintr4() at Xintr4+0x70
--- interrupt ---
idle(e50a4abc) at idle+0x1b
bpendtsleep(c05e710c,118,c048954d,65,0) at bpendtsleep
sys_select(e50a4abc,e50c8f88,e50c8f80) at sys_select+0x309
syscall_plain(1f,1f,0,8,bfbfdc60) at syscall_plain+0x98
db> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
23132 193 193 0 3 0x4 sshd getnewb
23131 9668 23130 0 3 0x4006 less getnewb
23130 9668 23130 0 3 0x4006 ps getnewb
23129 9560 9560 1000 3 0x4006 fetchmail getnewb
23128 21992 21992 1000 3 0x4006 perl getnewb
23105 277 23104 1000 3 0x4006 perl getnewb
23038 214 214 32767 3 0x184 httpd netcon
23036 214 214 32767 3 0x184 httpd netcon
23035 214 214 32767 3 0x184 httpd netcon
22994 214 214 32767 3 0x184 httpd netcon
22971 214 214 32767 3 0x104 httpd getnewb
22962 214 214 32767 3 0x184 htd netcon
22930 214 214 32767 3 0x184 httpd netcon
22859 9668 22859 0 4 0x5002 vi
21992 325 21992 1000 3 0x4086 sh wait
21106 266 21106 0 3 0x4006 dd getnewb
9668 270 9668 0 3 0x5086 zsh pause
9560 264 9560 1000 3 0x5086 mush wait
1163 1162 1161 0 4 0x4002 less
1162 1161 1161 0 4 0x4082 sh
1161 266 1161 0 4 0x5082 man
325 263 325 1000 3 0x4082 zsh pause
270 263 270 1000 3 0x4082 zsh pause
266 265 266 0 3 0x4082 zsh pause
265 263 265 1000 3 0x4082 zsh pause
263 1 263 1000 3 0x104 screen-3.9.8 getnewb
238 1 1 0 3 0x4004 getty vnlock
237 1 237 0 3 0x4006 getty getnewb
236 206 236 32767 3 0x4184 pinger select
231 1 231 0 3 0x4 inetd vnlock
224 1 224 0 3 0x4 sendmail vnlock
214 1 214 0 3 0x84 httpd select
211 206 211 32767 3 0x4080 unlinkd netio
206 202 7 0 3 0x4106 squid getnewb
202 1 7 0 3 0x4082 sh wait
193 1 193 0 3 0x4 sshd vnlock
12 0 0 0 3 0x20204 raid rfwcond
6 0 0 0 3 0x20204 aiodoned aiodone
5 0 0 0 3 0x20204 ioflush getnewb
4 0 0 0 3 0x20204 reaper reaper
3 0 0 0 3 0x20204 pagedaemon pgdaemo
2 0 0 0 3 0x20204 usb0 usbevt
1 0 1 0 3 0x4080 init wait
0 -1 0 0 3 0x20204 swapper schedul
db> kill CE <- squid in newcb
db> kill 48B <- less in normal status, but I cannot kill either
db> show uvmexp
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
127787 VM pages: 78530 active, 36887 inactive, 973 wired, 140 free
14843 anon, 101138 vnode, 0 vtext
freemin=64, free-target=85, inactive-target=38518, wired-max=42595
faults=6992210, traps=85475941, intrs=83588156, ctxswitch=59582042
softint=78595794, syscalls=292274228, swapins=1434, swapouts=1467
fault counts:
noram=3, noanon=0, pgwait=0, pgrele=0
ok relocks(total)=25163(25163), anget(retrys)=613856(3), amapcopy=241944
neighbor anon/obj pg=602787/3113099, gets(lock/unlock)=941513/25164
cases: anon=416621, anoncow=197235, obj=819333, prcopy=122176, przero=219795
9
daemon and swap counts:
woke=797, revs=797, scans=5326460, obscans=4571110, anscans=3436
busy=0, freed=4571111, reactivate=69923, deactivate=5554947
pageouts=222, pending=222, nswget=3
nswapdev=1, nanon=271923, nanonneeded=271923freeanon=257080
swpages=152451, swpginuse=879, swpgonly=0 paging=0
kernel pointers:
objs(kern/kmem/mb)=0xc05cdd40/0xc05cde50/0xc05cde68
db> sync
syncing disks...
(waited more than 5 minutes, no disk I/O, send break, send
break... no response. reset)
Regards,
>How-To-Repeat:
What I was doing is "dd if=/dev/zero of=/dev/sd0 count=XXXX bs=8192"
on a SCSI HDD bigger than 40G. I though I can zero-fill a hard drive in
this way before I sell this...
I received an email from Chuck Silvers <chuq@chuq.com> suggesting
use rsd0* instead and send pr.
sd0 is much faster than rsd0 for my case, so I chose it. Thank
you for the workaround.
>Fix:
Chuck said:
please send a PR about the new problem if you haven't already.
(this new problem is unrelated to the previous "uvn_fp1" hang.)
He might know or have an idea on this. Thanks.
>Release-Note:
>Audit-Trail:
>Unformatted: