Subject: kern/13942: deadlock in ufs quota code
To: None <gnats-bugs@gnats.netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: netbsd-bugs
Date: 09/12/2001 22:49:50
>Number: 13942
>Category: kern
>Synopsis: deadlock in ufs quota code
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Sep 12 22:50:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator: Chuck Silvers
>Release: NetBSD-current Wed Sep 12 22:16:53 PDT 2001
>Organization:
me
>Environment:
NetBSD 1.5X (SPIFFY.debug) #180: Wed Sep 12 22:16:53 PDT 2001 chs@spathi.chuq.com:/home/chs/netbsd/src/origsys/sys/arch/i386/compile/SPIFFY.debug
>Description:
I was enabling quotas on a freshly newfs'd filesystem to test
something else and I managed to trigger a deadlock in the quota code:
21 spathi2:~ # ls /mnt
quota.user
22 spathi2:~ # quotaon /mnt
23 spathi2:~ # mount
/dev/wd0a on / type ffs (local)
/dev/wd0g on /build type ffs (local, noatime, soft dependencies)
procfs on /proc type procfs (local)
/dev/wd0h on /mnt type ffs (local, with quotas)
24 spathi2:~ # ls -l /mnt
total 0
-rw-r--r-- 1 root wheel 0 Sep 12 14:52 quota.user
25 spathi2:~ # ,
Suspended
5 spathi2:~> cp .cshrc /mnt
6 spathi2:~> fg
nu
26 spathi2:~ # ls -l /mnt
total 17
-rw-r--r-- 1 chs wheel 16671 Sep 12 14:53 .cshrc
-rw-r--r-- 1 root wheel 0 Sep 12 14:52 quota.user
27 spathi2:~ # sync
... and now the sync is hung.
Stopped at cpu_Debugger+0x4: leave
db> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
1486 186 1486 0 3 0x4006 sync chkdq
373 196 373 0 3 0x4086 tcsh ttyin
196 195 196 1022 3 0x4086 tcsh pause
195 171 171 0 3 0x4184 rlogind select
186 178 186 0 3 0x5086 tcsh pause
178 177 178 1022 3 0x4086 tcsh pause
177 171 171 0 3 0x4184 rlogind select
176 1 176 0 3 0x4086 getty ttyin
174 1 174 0 3 0x84 cron nanosle
171 1 171 0 3 0x84 inetd select
85 1 85 0 3 0x84 syslogd select
6 0 0 0 3 0x20204 aiodoned aiodone
5 0 0 0 3 0x20204 ioflush syncer
4 0 0 0 3 0x20204 reaper reaper
3 0 0 0 3 0x20204 pagedaemon pgdaemo
2 0 0 0 3 0x20204 pciide0:1 sccomp
1 0 1 0 3 0x4084 init wait
0 -1 0 0 3 0x20204 swapper schedul
db> t/t 0t1486
trace: pid 1486 at 0xcf437a78
bpendtsleep(c0860900,9,c03102be,0,0) at bpendtsleep
chkdq(cf386d04,10,c083b480,0) at chkdq+0xf3
ffs_alloc(cf386d04,0,8,2000,c083b480) at ffs_alloc+0x265
ffs_balloc(cf437c98,cf43bcc8,29,c030ff29,20) at ffs_balloc+0x920
ffs_ballocn(cf437d50,c05ab180,c05aedc0,0,cedcb000) at ffs_ballocn+0xd3
ufs_balloc_range(cf43bc20,0,0,20,0) at ufs_balloc_range+0x948
ffs_write(cf437e90,0,cf43bc20,cf43ba70,c0187bd4) at ffs_write+0x225
dqsync(cf43bc20,c0860900) at dqsync+0x146
qsync(c0877e00) at qsync+0x7d
ffs_sync(c0877e00,2,c083b480,cf335c7c) at ffs_sync+0x1e0
sys_sync(cf335c7c,cf437f80,cf437f78) at sys_sync+0x56
syscall_plain(1f,1f,1f,1f,bfbfdff0) at syscall_plain+0x98
in order to write out the one quota record, we need to allocate space.
but to do that, we need to update the quota record that we have locked
because we're trying to write it out.
this also occurs in 1.5.2.
>How-To-Repeat:
see above.
>Fix:
left to the reader.
>Release-Note:
>Audit-Trail:
>Unformatted: