Subject: kern/36328: clone(2) with CLONE_FILES can leak POSIX locks
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <stix@stix.id.au>
List: netbsd-bugs
Date: 05/14/2007 14:40:00
>Number:         36328
>Category:       kern
>Synopsis:       clone(2) with CLONE_FILES can leak POSIX locks
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon May 14 14:40:00 +0000 2007
>Originator:     Paul Ripke
>Release:        NetBSD 4.0_BETA2
>Organization:
>Environment:
System: NetBSD zion.stix.org.au 4.0_BETA2 NetBSD 4.0_BETA2 (ZION) #1: Mon Mar 19 10:58:04 EST 2007 stix@zion.stix.org.au:/export/netbsd/netbsd-4/obj.i386/export/netbsd/netbsd-4/src/sys/arch/i386/compile/ZION i386

Architecture: i386
Machine: i386
>Description:
Calling clone(2) with the CLONE_FILES flag, and where the child process
takes out POSIX locks via fcntl(2) F_SETLK/F_SETLKW, but leaves the
parent to close(2) the file descriptors (say, at exit(3) time), leaves
the lock structure pointing to an invalid proc struct, possibly preventing
future processes from obtaining locks on that file.

This was seen while attempting to run some Linux binaries, but can be
duplicated using native clone(2).

>How-To-Repeat:

Compile the following test program:

# cat cloneandlock.c
#include <err.h>
#include <fcntl.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int clonefunc(void *arg);

int
main(int argc, char **argv)
{
	pid_t kid;
	char *kidstack;

	if (argc != 2) {
		fprintf(stderr, "Usage: cloneandlock <file>\n");
		exit(EXIT_FAILURE);
	}
	if ((kidstack = malloc(8192)) == NULL)
		err(EXIT_FAILURE, "malloc");
	fprintf(stderr, "Parent PID: %d\n", getpid());
	kid = clone(&clonefunc, kidstack + 8188,
		    CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_VFORK, argv[1]);
	if (kid == -1)
		err(EXIT_FAILURE, "clone");
	return 0;
}

int
clonefunc(void *arg)
{
	struct flock fl;
	int fd;

	fprintf(stderr, "clone start, PID %d\n", getpid());
	if ((fd = open(arg, O_RDWR | O_CREAT, 0644)) == -1)
		err(EXIT_FAILURE, "open");
	fl.l_start = 0;
	fl.l_len = 0;
	fl.l_pid = getpid();
	fl.l_type = F_WRLCK;
	fl.l_whence = SEEK_SET;
	if (fcntl(fd, F_SETLK, &fl) == -1)
		err(EXIT_FAILURE, "fcntl");
	return 0;
}
# cc -Wall -O -o cloneandlock cloneandlock.c

Run using eg. cat to use up process table slots:

# ./cloneandlock testfile
Parent PID: 367
clone start, PID 21552
# cat &
[1] 625
# cat &
[2] 370
[1] + Stopped (tty input)  cat 
# ./cloneandlock testfile
Parent PID: 371
clone start, PID 628
cloneandlock: fcntl: Resource temporarily unavailable
# 

With LOCKF_DEBUG set, the above generated the following debug output:

May 14 23:06:31  /netbsd: lf_setlock: lock 0xcb0cfd00 for proc 21552 exclusive, start 0, end ffffffffffffffff
May 14 23:06:31  /netbsd: lf_setlock: got the lock: lock 0xcb0cfd00 for proc 21552 exclusive, start 0, end ffffffffffffffff
May 14 23:06:31  /netbsd: lf_setlock: Lock list:
May 14 23:06:31  /netbsd: lock 0xcb0cfd00 for proc 21552, exclusive, start 0, end ffffffffffffffff

May 14 23:06:36  /netbsd: lf_setlock: lock 0xcb0cfccc for proc 628 exclusive, start 0, end ffffffffffffffff
May 14 23:06:36  /netbsd: lf_findoverlap: looking for overlap in: lock 0xcb0cfccc for proc 628 exclusive, start 0, end ffffffffffffffff
May 14 23:06:36  /netbsd: checking: lock 0xcb0cfd00 for proc 370 exclusive, start 0, end ffffffffffffffff
May 14 23:06:36  /netbsd: overlap == lock

Note that the lock was picked up by process ID 370, one of the cat(1)
processes.

>Fix:

Linux appears to track a tgid (thread group ID?) for each process, which is
inherited across clone(2) calls, and uses this for lock ownership. I don't
know if a similar solution would be appropriate for NetBSD.