Subject: kern/15887: quota error can cause the kernel to panic or lock up.
To: None <gnats-bugs@gnats.netbsd.org>
From: Stephen Jones <smj@otaku.freeshell.org>
List: netbsd-bugs
Date: 03/12/2002 18:54:47
>Number:         15887
>Category:       kern
>Synopsis:       quota error can cause the kernel to panic or lock up.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Mar 12 10:55:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Stephen Jones
>Release:        NetBSD 1.5.1
>Organization:
SDF Public Access UNIX System
>Environment:
DEC Alpha 5305 (AS1200), NetBSD
System: NetBSD otaku 1.5.3_ALPHA NetBSD 1.5.3_ALPHA (OTAKU) #0: Thu Mar 7 21:27:46 UTC 2002 alpha

>Description:

	A quota error can cause the kernel to panic or lock up.

        On our system hundreds or users are created and purged on a daily
        basis.  Each user is assigned a quota.  The quota file contains 
        the struct dqblk entries indexed by user or group id.  In our case
        user id, and since we recycle user ids this could cause information
        for a 'purged' user id to perhaps become lost or fragmented.  When a
        user is purged, all files for that user are deleted as well, the
        uid is freed up and put back into a list for new accounts.

        Please see How-To-Repeat for information on lockup up the system.

        Kern Panics related to quota (the complete message and db output is
        below)

        cpu_reboot() at cpu_reboot+0x68
	panic() at panic+0x194
	trap() at trap+0x50c
	XentMM() at XentMM+0x20
	--- memory management fault (from ipl 0) ---
	dqget() at dqget+0xd4

        This indicates a trap occured at dqget+0xd4 or there abouts.
        in ufs_quota.c we find a candinated for a memory fault:

        /*
         * Check the cache first.
         */
        dqh = DQHASH(dqvp, id);
        for (dq = dqh->lh_first; dq; dq = dq->dq_hash.le_next) {
                if (dq->dq_id != id ||
                    dq->dq_ump->um_quotas[dq->dq_type] != dqvp)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        This appears to be the section of code at dqget+0xd4; 
        if dq->dq_ump is bad, or if dq->dq_type is bad, it could cause
        a trap at dqget+0xd4.

db> cont
tlp0: receive ring overrun
ex0: uplistptr was 0
syncing disks... panic: lockmgr: locking against myself
Stopped in mutt_dotlock at      cpu_Debugger+0x4:       ret     zero,(ra)
db> trace
cpu_Debugger() at cpu_Debugger+0x4
panic() at panic+0xfc
lockmgr() at lockmgr+0x6e8
genfs_lock() at genfs_lock+0x28
vn_lock() at vn_lock+0x64
vget() at vget+0x124
qsync() at qsync+0x98
ffs_sync() at ffs_sync+0x248
sys_sync() at sys_sync+0xb0
vfs_shutdown() at vfs_shutdown+0xa4
cpu_reboot() at cpu_reboot+0x68
panic() at panic+0x194
trap() at trap+0x50c
XentMM() at XentMM+0x20
--- memory management fault (from ipl 0) ---
dqget() at dqget+0xd4
getinoquota() at getinoquota+0x54
ufs_access() at ufs_access+0x84
ufs_lookup() at ufs_lookup+0x678
lookup() at lookup+0x40c
namei() at namei+0x424
vn_open() at vn_open+0x80
sys_open() at sys_open+0xf0
syscall() at syscall+0x1c8
XentSys() at XentSys+0x50
--- syscall (5, netbsd.sys_open) ---
--- user mode ---
db>

fatal kernel trap:

    trap entry = 0x2 (memory management fault)
    a0         = 0x28
    a1         = 0x1
    a2         = 0x0
    pc         = 0xfffffc0000434734
    ra         = 0xfffffc00004330b4
    curproc    = 0xfffffc0026fb8a90
        pid = 9449, comm = mutt_dotlock

panic: trap
tlp0: receive ring overrun
ex0: uplistptr was 0
syncing disks... panic: lockmgr: locking against myself
sd4: cache synchronization failed
rebooting..


>How-To-Repeat:

To cause the kern lock up (and forgive me if this seems impossible).
Create a lot of users (about 5000) with enabled and set quotas for each. 
Create files in their home directories.
Remove 2500 accounts from the passwd file (don't remove their directories)
Then try this in the filesystem root for their home directories:

for i in *
do  
 chown $i $i
done

>Fix:

  disable quotas.
>Release-Note:
>Audit-Trail:
>Unformatted: