Subject: kern/3860: kernel file locking functions not robust vs. debugger
To: None <gnats-bugs@gnats.netbsd.org>
From: John Kohl <jtk@kolvir.arlington-heights.ma.us>
List: netbsd-bugs
Date: 07/13/1997 23:26:44
>Number: 3860
>Category: kern
>Synopsis: kernel file locking functions not robust vs. debugger
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jul 13 20:35:01 1997
>Last-Modified:
>Originator: John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release: NetBSD-current, 1997/07/13
>Environment:
System: NetBSD pattern.arlington-heights.ma.us 1.2G NetBSD 1.2G (PATTERN) #24: Sun Jul 13 23:05:45 EDT 1997 jtk@pattern.arlington-heights.ma.us:/u4/sandbox/src/sys/arch/i386/compile/PATTERN i386
>Description:
If you debug a program which uses flock() or lockf(), and you interrupt
it while it's blocking for a lock, when it resumes it restarts and
wedges the top half of the kernel, providing a handy denial-of-service
attack for anybody with debugger access.
The call to tsleep() inside vfs_lockf.c:lf_setlock() will wake up with
no error but the condition has not been signaled. This is implicitly
allowed to happen by existing practice implementations of
sleep()/tsleep(), yet the code assumes that it will not happen. The
code then loops around and attempts to add itself again to the list of
blocking locks, creating a circularity that hangs the top half of the
kernel when lf_addblock() tries to find the end of the blocking list.
>How-To-Repeat:
run a program which will block on a lock (see a test below) under a
debugger. Hit ^C, then continue the program. Watch your machine wedge :(
Here's the output on such a run with LOCKF_DEBUG set and lockf_debug=3:
(I added some KASSERTs and extra prints to track down this bug)
lf_setlock: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff
lf_setlock: got the lock: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff
lf_setlock: Lock list:
lock 0xf880bd80 for id 0x0xf880be40, shared, start 0, end ffffffffffffffff
lf_setlock: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
lf_findoverlap: looking for overlap in: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
checking: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff
overlap == lock
lf_clearlock: lock 0xf880bd40 for id 0x0xf880bdc0 unlock, start 0, end ffffffffffffffff
lf_findoverlap: looking for overlap in: lock 0xf880bd40 for id 0x0xf880bdc0 unlock, start 0, end ffffffffffffffff
lf_clearlock: Lock list:
lock 0xf880bd40 for id 0x0xf880bdc0, unlock, start 0, end ffffffffffffffff
addblock: adding: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
to blocked list of: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff
lf_setlock: blocking on: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff block 0xf880bd40
lf_setlock: Lock list:
lock 0xf880bd80 for id 0x0xf880be40, shared, start 0, end ffffffffffffffff block 0xf880bd40
lock 0xf880bd40 for id 0x0xf880bdc0, exclusive, start 0, end ffffffffffffffff
lf_setlock: wakeup, no error: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
lf_findoverlap: looking for overlap in: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
checking: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff block 0xf880bd40
overlap == lock
lf_clearlock: lock 0xf880bd40 for id 0x0xf880bdc0 unlock, start 0, end ffffffffffffffff
lf_findoverlap: looking for overlap in: lock 0xf880bd40 for id 0x0xf880bdc0 unlock, start 0, end ffffffffffffffff
lf_clearlock: Lock list:
lock 0xf880bd40 for id 0x0xf880bdc0, unlock, start 0, end ffffffffffffffff
addblock: adding: lock 0xf880bd40 for id 0x0xf880bdc0 exclusive, start 0, end ffffffffffffffff
to blocked list of: lock 0xf880bd80 for id 0x0xf880be40 shared, start 0, end ffffffffffffffff block 0xf880bd40
panic: kernel diagnostic assertion "lf != blocked" failed: file "../../../../kern/vfs_lockf.c", line 675
#include <fcntl.h>
#include <stdlib.h>
#include <sys/file.h>
int
main(int argc, char *argv[])
{
int fd1, fd2;
if (argc < 2)
errx(1, "arg count");
fd1 = open(argv[1], O_RDWR);
if (fd1 == -1)
err(1, "%s", argv[1]);
fd2 = open(argv[1], O_RDWR);
if (fd2 == -1)
err(1, "%s", argv[1]);
if (flock(fd1, LOCK_SH) == -1)
err(1, "flock 1");
if (flock(fd2, LOCK_EX) == -1)
err(1, "flock 2");
close(fd1);
close(fd2);
return 0;
}
>Fix:
Not sure exactly how to code this, but the sleep loop should be
smarter. Maybe 4.4BSD-Lite2 has fixed this bug?
>Audit-Trail:
>Unformatted: