Subject: kern/13942: deadlock in ufs quota code
To: None <gnats-bugs@gnats.netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: netbsd-bugs
Date: 09/12/2001 22:49:50
>Number:         13942
>Category:       kern
>Synopsis:       deadlock in ufs quota code
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Sep 12 22:50:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Chuck Silvers
>Release:        NetBSD-current Wed Sep 12 22:16:53 PDT 2001
>Organization:
me
>Environment:
NetBSD 1.5X (SPIFFY.debug) #180: Wed Sep 12 22:16:53 PDT 2001 chs@spathi.chuq.com:/home/chs/netbsd/src/origsys/sys/arch/i386/compile/SPIFFY.debug


>Description:
	I was enabling quotas on a freshly newfs'd filesystem to test
	something else and I managed to trigger a deadlock in the quota code:

21 spathi2:~ # ls /mnt
quota.user
22 spathi2:~ # quotaon /mnt
23 spathi2:~ # mount
/dev/wd0a on / type ffs (local)
/dev/wd0g on /build type ffs (local, noatime, soft dependencies)
procfs on /proc type procfs (local)
/dev/wd0h on /mnt type ffs (local, with quotas)
24 spathi2:~ # ls -l /mnt
total 0
-rw-r--r--  1 root  wheel  0 Sep 12 14:52 quota.user
25 spathi2:~ # ,

Suspended
5 spathi2:~> cp .cshrc /mnt
6 spathi2:~> fg
nu
26 spathi2:~ # ls -l /mnt
total 17
-rw-r--r--  1 chs   wheel  16671 Sep 12 14:53 .cshrc
-rw-r--r--  1 root  wheel      0 Sep 12 14:52 quota.user
27 spathi2:~ # sync

... and now the sync is hung.

Stopped at      cpu_Debugger+0x4:       leave
db> ps
 PID             PPID       PGRP        UID S   FLAGS          COMMAND    WAIT
 1486             186       1486          0 3  0x4006             sync   chkdq
 373              196        373          0 3  0x4086             tcsh   ttyin
 196              195        196       1022 3  0x4086             tcsh   pause
 195              171        171          0 3  0x4184          rlogind  select
 186              178        186          0 3  0x5086             tcsh   pause
 178              177        178       1022 3  0x4086             tcsh   pause
 177              171        171          0 3  0x4184          rlogind  select
 176                1        176          0 3  0x4086            getty   ttyin
 174                1        174          0 3    0x84             cron nanosle
 171                1        171          0 3    0x84            inetd  select
 85                 1         85          0 3    0x84          syslogd  select
 6                  0          0          0 3 0x20204         aiodoned aiodone
 5                  0          0          0 3 0x20204          ioflush  syncer
 4                  0          0          0 3 0x20204           reaper  reaper
 3                  0          0          0 3 0x20204       pagedaemon pgdaemo
 2                  0          0          0 3 0x20204        pciide0:1  sccomp
 1                  0          1          0 3  0x4084             init    wait
 0                 -1          0          0 3 0x20204          swapper schedul
db> t/t 0t1486
trace: pid 1486 at 0xcf437a78
bpendtsleep(c0860900,9,c03102be,0,0) at bpendtsleep
chkdq(cf386d04,10,c083b480,0) at chkdq+0xf3
ffs_alloc(cf386d04,0,8,2000,c083b480) at ffs_alloc+0x265
ffs_balloc(cf437c98,cf43bcc8,29,c030ff29,20) at ffs_balloc+0x920
ffs_ballocn(cf437d50,c05ab180,c05aedc0,0,cedcb000) at ffs_ballocn+0xd3
ufs_balloc_range(cf43bc20,0,0,20,0) at ufs_balloc_range+0x948
ffs_write(cf437e90,0,cf43bc20,cf43ba70,c0187bd4) at ffs_write+0x225
dqsync(cf43bc20,c0860900) at dqsync+0x146
qsync(c0877e00) at qsync+0x7d
ffs_sync(c0877e00,2,c083b480,cf335c7c) at ffs_sync+0x1e0
sys_sync(cf335c7c,cf437f80,cf437f78) at sys_sync+0x56
syscall_plain(1f,1f,1f,1f,bfbfdff0) at syscall_plain+0x98


in order to write out the one quota record, we need to allocate space.
but to do that, we need to update the quota record that we have locked
because we're trying to write it out.

this also occurs in 1.5.2.



>How-To-Repeat:
	see above.

>Fix:
	left to the reader.
>Release-Note:
>Audit-Trail:
>Unformatted: