Subject: implementing closeall via a syscall
To: None <tech-kern@netbsd.org>
From: mouss <usebsd@free.fr>
List: tech-kern
Date: 01/04/2004 23:44:29
This is a multi-part message in MIME format.
--------------070908070407060203050908
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
I have implemented a syscall to close all descriptors from some value to
the max open, that is:
closem(k) will close descriptors k, k+1, ..., max_open_fd
[rationale]
dameons (and other apps) sometimes need to close almost all descriptors.
The primary method to do this was to call close on fds from k to N,
where N is either a fixed value or the result of a function (SOPEN_MAX,
getrlimit, getdtablesize, ...). Unfortunately, this has two problems:
* a program may lower its limits while having a lot of fds open, so the
return value of getrlimit, sysconf, ... do not necessarily match the
number of open files inherited from a parent.
* there are too many useless syscalls
An alternative is the use of /proc/ and close each open fd. This gets
the open fds right, but still consumes many syscalls. While this may be
acceptable, procfs is not necessarily the right place (kernfs maybe?).
AIX has a F_CLOSEM cmd to fcntl to do just that. I originally intended
to implement this, but fcntl code checks that the fd arg is valid, which
is not relevant for the closem() function. Also, I got comments (a very
long time ago) that this would change the semantics of fcntl (which up
so far acts on a single fd and doesn't touch other fds), which seems a
reasonable counter-arg. Also, I'm not aware of any unix that followed
the aix path, so chances are this won't happen, so compatibility is not
critical.
Thus the syscall approach...
[name]
How to name the syscall? for now, I just called it closem().
closeall() would be a bad name because the current closeall() closes
_all_ descriptors, so confusion would result if the same name is reused.
Also, closeall may be present in user apps?
solaris has closefrom(), which does the same thing (using /proc if I'm
not mistaken). So that would be a better name.
[questions]
- is such a thing desired/desirable?
- can someone review the code to check it's correct?
(the libc part is missing, as well as the manpage. I tested it using
syscall() directly, but I might have missed something).
mouss
--------------070908070407060203050908
Content-Type: text/plain;
name="closem.diffs"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="closem.diffs"
*** kern/init_sysent.c.orig Wed Nov 19 13:02:11 2003
--- kern/init_sysent.c Sun Jan 4 22:55:53 2004
***************
*** 954,960 ****
{ 4, s(struct sys_fsync_range_args), 0,
sys_fsync_range }, /* 354 = fsync_range */
{ 0, 0, 0,
! sys_nosys }, /* 355 = filler */
{ 0, 0, 0,
sys_nosys }, /* 356 = filler */
{ 0, 0, 0,
--- 954,960 ----
{ 4, s(struct sys_fsync_range_args), 0,
sys_fsync_range }, /* 354 = fsync_range */
{ 0, 0, 0,
! sys_closem }, /* 355 = filler */
{ 0, 0, 0,
sys_nosys }, /* 356 = filler */
{ 0, 0, 0,
*** kern/kern_descrip.c.orig Sun Jan 4 18:57:42 2004
--- kern/kern_descrip.c Sun Jan 4 22:53:28 2004
***************
*** 610,615 ****
--- 610,646 ----
return (fdrelease(p, fd));
}
+
+ /*
+ * Close multiple file descriptors.
+ */
+ /* ARGSUSED */
+ int
+ sys_closem(struct lwp *l, void *v, register_t *retval)
+ {
+ struct sys_close_args /* {
+ syscallarg(int) fd;
+ } */ *uap = v;
+ int fd;
+ struct filedesc *fdp;
+ struct proc *p;
+ int i;
+
+ p = l->l_proc;
+ fd = SCARG(uap, fd);
+ fdp = p->p_fd;
+
+ if ((u_int) fd >= fdp->fd_nfiles)
+ return (EBADF);
+
+ for (i=fdp->fd_lastfile; i>=fd; i--) {
+ fdrelease(p, i);
+ }
+
+ return 0;
+ }
+
+
/*
* Return status information about a file descriptor.
*/
*** kern/syscalls.c.orig Sun Jan 4 22:00:13 2004
--- kern/syscalls.c Sun Jan 4 22:52:44 2004
***************
*** 494,497 ****
--- 494,498 ----
"#352 (unimplemented sys_sched_get_priority_min)", /* 352 = unimplemented sys_sched_get_priority_min */
"#353 (unimplemented sys_sched_rr_get_interval)", /* 353 = unimplemented sys_sched_rr_get_interval */
"fsync_range", /* 354 = fsync_range */
+ "closem", /* 355 = closem */
};
*** kern/syscalls.master.orig Sun Jan 4 21:53:25 2004
--- kern/syscalls.master Sun Jan 4 21:55:10 2004
***************
*** 708,710 ****
--- 708,715 ----
354 STD { int sys_fsync_range(int fd, int flags, off_t start, \
off_t length); }
+
+ ;
+ ;
+ ;
+ 355 STD { int sys_closem(int fd); }
*** sys/syscall.h.orig Sun Jan 4 22:01:34 2004
--- sys/syscall.h Sun Jan 4 22:54:00 2004
***************
*** 973,977 ****
/* syscall: "fsync_range" ret: "int" args: "int" "int" "off_t" "off_t" */
#define SYS_fsync_range 354
! #define SYS_MAXSYSCALL 355
#define SYS_NSYSENT 512
--- 973,980 ----
/* syscall: "fsync_range" ret: "int" args: "int" "int" "off_t" "off_t" */
#define SYS_fsync_range 354
! /* syscall: "closem" ret: "int" args: "int" */
! #define SYS_closem 355
!
! #define SYS_MAXSYSCALL 356
#define SYS_NSYSENT 512
*** sys/syscallargs.h.orig Sun Jan 4 22:01:43 2004
--- sys/syscallargs.h Sun Jan 4 22:54:09 2004
***************
*** 1476,1481 ****
--- 1476,1483 ----
int sys_close(struct lwp *, void *, register_t *);
+ int sys_closem(struct lwp *, void *, register_t *);
+
int sys_wait4(struct lwp *, void *, register_t *);
int compat_43_sys_creat(struct lwp *, void *, register_t *);
--------------070908070407060203050908--