Subject: kern/4269: msync() doesn't conform to POSIX
To: None <gnats-bugs@gnats.netbsd.org>
From: None <cgd@NetBSD.ORG>
List: netbsd-bugs
Date: 10/13/1997 20:56:46
>Number: 4269
>Category: kern
>Synopsis: msync() doesn't conform to POSIX
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: change-request
>Submitter-Id: net
>Arrival-Date: Mon Oct 13 14:05:04 1997
>Last-Modified:
>Originator: Chris G. Demetriou
>Organization:
Kernel Hackers 'r' Us
>Release: 1.2G, but a few months old.
>Environment:
System: NetBSD brick.demetriou.com 1.2G NetBSD 1.2G (BRICK) #116: Wed Jul 16 14:03:06 PDT 1997 cgd@brick.demetriou.com:/usr/src/sys/arch/i386/compile/BRICK i386
>Description:
msync() doesn't conform to POSIX. IEEE Std 1003.1b-1993 specifies
that msync() is to be invoked:
msync(void *addr, size_t size, int flags);
but our msync() only has two arguments. This causes various
third party applications which want to use msync() to break.
(The one that I run into often is lmbench.)
>How-To-Repeat:
Try to compile lmbench's lat_pagefault, or compile other programs
which want a POSIX msync().
>Fix:
Below are some diffs which implement a fix. I'd suggest that they
be applied (with the following as suggestions for what else
to do while they're being applied):
* vm_swap_12.c should really be renamed vm_12.c.
* the new msync() syscall number might have to be
re-chosen; I didn't know which to use.
* the syscall tables & headers need to be re-gen'd after
this patch is applied.
* The libc major number probably has to be bumped for this
change to work right, or some other backward-compatibility
magic needs to be done.
The documentation update could use a bit of cleanup.
Some notes about the diffs:
* the changes to vm_map_clean() should be OK; as far
as I can tell, nothing but msync() uses it anyway.
* the standard and other OSes' manual pages (at least
the ones I checked) are unclear as to which combinations
of flags are allowed. The behaviour of this code was
decided based on the results of calling the Digital UNIX
msync() with a bunch of flags.
* the "Disallow wrap-wround" test was simply wrong. There's
no reason that a size_t has to be the same size as
an int, etc.
* A few error returns were changed to match the standard,
and a new VM internal error code was added so I wouldn't
have to overload a different code.
* Use of MS_ASYNC would cause the process to hang with
"vospgw". The code path it takes in that case looks
... severly broken, and I'm not going to debug it, so
i hacked the code so that it always does sync I/O.
There are a few source files in the tree which use msync(). They
can be found easily enough with grep. There are also a few sets of
binary emulation code which stub out an msync() call for their OSes.
They might benefit from this code, as well.
Index: sys/compat/common/vm_swap_12.c
===================================================================
RCS file: /cvsroot/src/sys/compat/common/vm_swap_12.c,v
retrieving revision 1.1.1.1
diff -c -r1.1.1.1 vm_swap_12.c
*** vm_swap_12.c 1997/07/08 00:45:15 1.1.1.1
--- vm_swap_12.c 1997/10/10 23:19:20
***************
*** 37,42 ****
--- 37,43 ----
#include <sys/mount.h> /* needed for next include! */
#include <sys/syscallargs.h>
+ #include <sys/mman.h>
#include <vm/vm_swap.h>
int
***************
*** 54,57 ****
--- 55,76 ----
SCARG(&ua, arg) = (void *)SCARG(uap, name);
SCARG(&ua, misc) = 0; /* priority */
return (sys_swapctl(p, &ua, retval));
+ }
+
+ int
+ compat_12_sys_msync(p, v, retval)
+ struct proc *p;
+ void *v;
+ register_t *retval;
+ {
+ struct sys_msync_args ua;
+ struct compat_12_sys_msync_args /* {
+ syscallarg(caddr_t) addr;
+ syscallarg(size_t) len;
+ } */ *uap = v;
+
+ SCARG(&ua, addr) = SCARG(uap, addr);;
+ SCARG(&ua, len) = SCARG(uap, len);;
+ SCARG(&ua, flags) = MS_SYNC | MS_INVALIDATE;
+ return (sys_msync(p, &ua, retval));
}
Index: sys/kern/syscalls.master
===================================================================
RCS file: /cvsroot/src/sys/kern/syscalls.master,v
retrieving revision 1.1.1.4
diff -c -r1.1.1.4 syscalls.master
*** syscalls.master 1997/07/08 00:46:26 1.1.1.4
--- syscalls.master 1997/10/10 23:21:38
***************
*** 139,145 ****
63 COMPAT_43 { int sys_getkerninfo(int op, char *where, int *size, \
int arg); } ogetkerninfo
64 COMPAT_43 { int sys_getpagesize(void); } ogetpagesize
! 65 STD { int sys_msync(caddr_t addr, size_t len); }
66 STD { int sys_vfork(void); }
67 OBSOL vread
68 OBSOL vwrite
--- 139,145 ----
63 COMPAT_43 { int sys_getkerninfo(int op, char *where, int *size, \
int arg); } ogetkerninfo
64 COMPAT_43 { int sys_getpagesize(void); } ogetpagesize
! 65 COMPAT_12 { int sys_msync(caddr_t addr, size_t len); }
66 STD { int sys_vfork(void); }
67 OBSOL vread
68 OBSOL vwrite
***************
*** 494,496 ****
--- 494,497 ----
270 STD { int sys_posix_rename(const char *from, \
const char *to); }
271 STD { int sys_swapctl(int cmd, void *arg, int misc); }
+ 272 STD { int sys_msync(void *addr, size_t len, int flags); }
Index: sys/sys/mman.h
===================================================================
RCS file: /cvsroot/src/sys/sys/mman.h,v
retrieving revision 1.1.1.1
diff -c -r1.1.1.1 mman.h
*** mman.h 1997/04/19 01:23:09 1.1.1.1
--- mman.h 1997/10/13 18:47:30
***************
*** 79,84 ****
--- 79,91 ----
#define MADV_WILLNEED 3 /* will need these pages */
#define MADV_DONTNEED 4 /* dont need these pages */
+ /*
+ * Flags to msync
+ */
+ #define MS_ASYNC 0x01 /* perform asynchronous writes */
+ #define MS_SYNC 0x02 /* perform synchronous writes */
+ #define MS_INVALIDATE 0x04 /* invalidate cached data */
+
#ifndef _KERNEL
#include <sys/cdefs.h>
***************
*** 88,94 ****
caddr_t mmap __P((caddr_t, size_t, int, int, int, off_t));
int mprotect __P((caddr_t, size_t, int));
int munmap __P((caddr_t, size_t));
! int msync __P((caddr_t, size_t));
int mlock __P((caddr_t, size_t));
int munlock __P((caddr_t, size_t));
int madvise __P((caddr_t, size_t, int));
--- 95,101 ----
caddr_t mmap __P((caddr_t, size_t, int, int, int, off_t));
int mprotect __P((caddr_t, size_t, int));
int munmap __P((caddr_t, size_t));
! int msync __P((void *, size_t, int));
int mlock __P((caddr_t, size_t));
int munlock __P((caddr_t, size_t));
int madvise __P((caddr_t, size_t, int));
Index: sys/vm/vm_map.c
===================================================================
RCS file: /cvsroot/src/sys/vm/vm_map.c,v
retrieving revision 1.1.1.3
diff -c -r1.1.1.3 vm_map.c
*** vm_map.c 1997/07/08 00:48:26 1.1.1.3
--- vm_map.c 1997/10/11 02:24:31
***************
*** 1403,1409 ****
}
/*
! * Make a first pass to check for holes.
*/
for (current = entry; current->start < end; current = current->next) {
if (current->is_sub_map) {
--- 1403,1410 ----
}
/*
! * Make a first pass to check for holes and (if invalidating)
! * wired pages.
*/
for (current = entry; current->start < end; current = current->next) {
if (current->is_sub_map) {
***************
*** 1415,1420 ****
--- 1416,1425 ----
current->end != current->next->start)) {
vm_map_unlock_read(map);
return(KERN_INVALID_ADDRESS);
+ }
+ if (current->wired_count) {
+ vm_map_unlock_read(map);
+ return(KERN_PAGES_LOCKED);
}
}
Index: sys/vm/vm_mmap.c
===================================================================
RCS file: /cvsroot/src/sys/vm/vm_mmap.c,v
retrieving revision 1.1.1.2
diff -c -r1.1.1.2 vm_mmap.c
*** vm_mmap.c 1997/07/08 00:48:27 1.1.1.2
--- vm_mmap.c 1997/10/13 17:20:06
***************
*** 279,301 ****
register_t *retval;
{
struct sys_msync_args /* {
! syscallarg(caddr_t) addr;
syscallarg(size_t) len;
} */ *uap = v;
vm_offset_t addr;
vm_size_t size, pageoff;
vm_map_t map;
! int rv;
boolean_t syncio, invalidate;
addr = (vm_offset_t)SCARG(uap, addr);
size = (vm_size_t)SCARG(uap, len);
#ifdef DEBUG
if (mmapdebug & (MDB_FOLLOW|MDB_SYNC))
! printf("msync(%d): addr %lx len %lx\n",
! p->p_pid, addr, size);
#endif
/*
* Align the address to a page boundary,
* and adjust the size accordingly.
--- 279,311 ----
register_t *retval;
{
struct sys_msync_args /* {
! syscallarg(void *) addr;
syscallarg(size_t) len;
+ syscallarg(int) flags;
} */ *uap = v;
vm_offset_t addr;
vm_size_t size, pageoff;
vm_map_t map;
! int rv, flags;
boolean_t syncio, invalidate;
addr = (vm_offset_t)SCARG(uap, addr);
size = (vm_size_t)SCARG(uap, len);
+ flags = SCARG(uap, flags);
#ifdef DEBUG
if (mmapdebug & (MDB_FOLLOW|MDB_SYNC))
! printf("msync(%d): addr %lx len %lx flags %x\n",
! p->p_pid, addr, size, flags);
#endif
+ /* sanity check flags */
+ if ((flags & ~(MS_ASYNC | MS_SYNC | MS_INVALIDATE)) != 0 ||
+ (flags & (MS_ASYNC | MS_SYNC | MS_INVALIDATE)) == 0 ||
+ (flags & (MS_ASYNC | MS_SYNC)) == (MS_ASYNC | MS_SYNC))
+ return (EINVAL);
+ if ((flags & (MS_ASYNC | MS_SYNC)) == 0)
+ flags |= MS_SYNC;
+
/*
* Align the address to a page boundary,
* and adjust the size accordingly.
***************
*** 306,313 ****
size = (vm_size_t) round_page(size);
/* Disallow wrap-around. */
! if (addr + (int)size < addr)
! return (EINVAL);
map = &p->p_vmspace->vm_map;
/*
--- 316,323 ----
size = (vm_size_t) round_page(size);
/* Disallow wrap-around. */
! if (addr + size < addr)
! return (ENOMEM);
map = &p->p_vmspace->vm_map;
/*
***************
*** 325,331 ****
rv = vm_map_lookup_entry(map, addr, &entry);
vm_map_unlock_read(map);
if (rv == FALSE)
! return (EINVAL);
addr = entry->start;
size = entry->end - entry->start;
}
--- 335,341 ----
rv = vm_map_lookup_entry(map, addr, &entry);
vm_map_unlock_read(map);
if (rv == FALSE)
! return (ENOMEM);
addr = entry->start;
size = entry->end - entry->start;
}
***************
*** 334,350 ****
printf("msync: cleaning/flushing address range [%lx-%lx)\n",
addr, addr+size);
#endif
/*
! * Could pass this in as a third flag argument to implement
! * Sun's MS_ASYNC.
*/
syncio = TRUE;
! /*
! * XXX bummer, gotta flush all cached pages to ensure
! * consistency with the file system cache. Otherwise, we could
! * pass this in to implement Sun's MS_INVALIDATE.
! */
! invalidate = TRUE;
/*
* Clean the pages and interpret the return value.
*/
--- 344,363 ----
printf("msync: cleaning/flushing address range [%lx-%lx)\n",
addr, addr+size);
#endif
+
+ #if 0
/*
! * XXX Asynchronous msync() causes:
! * . the process to hang on wchan "vospgw", and
! * . a "vm_object_page_clean: pager_put error" message to
! * be printed by the kernel.
*/
+ syncio = (flags & MS_SYNC) ? TRUE : FALSE;
+ #else
syncio = TRUE;
! #endif
! invalidate = (flags & MS_INVALIDATE) ? TRUE : FALSE;
!
/*
* Clean the pages and interpret the return value.
*/
***************
*** 353,361 ****
case KERN_SUCCESS:
break;
case KERN_INVALID_ADDRESS:
! return (EINVAL); /* Sun returns ENOMEM? */
case KERN_FAILURE:
return (EIO);
default:
return (EINVAL);
}
--- 366,376 ----
case KERN_SUCCESS:
break;
case KERN_INVALID_ADDRESS:
! return (ENOMEM);
case KERN_FAILURE:
return (EIO);
+ case KERN_PAGES_LOCKED:
+ return (EBUSY);
default:
return (EINVAL);
}
Index: sys/vm/vm_param.h
===================================================================
RCS file: /cvsroot/src/sys/vm/vm_param.h,v
retrieving revision 1.1.1.1
diff -c -r1.1.1.1 vm_param.h
*** vm_param.h 1997/04/19 01:16:19 1.1.1.1
--- vm_param.h 1997/10/10 23:15:14
***************
*** 124,129 ****
--- 124,130 ----
#define KERN_RESOURCE_SHORTAGE 6
#define KERN_NOT_RECEIVER 7
#define KERN_NO_ACCESS 8
+ #define KERN_PAGES_LOCKED 9
#ifndef ASSEMBLER
/*
Index: lib/libc/sys/msync.2
===================================================================
RCS file: /cvsroot/src/lib/libc/sys/msync.2,v
retrieving revision 1.1.1.1
diff -c -r1.1.1.1 msync.2
*** msync.2 1997/04/27 16:56:24 1.1.1.1
--- msync.2 1997/10/11 04:12:08
***************
*** 33,55 ****
.\"
.\" @(#)msync.2 8.1 (Berkeley) 6/9/93
.\"
! .Dd June 9, 1993
.Dt MSYNC 2
! .Os
.Sh NAME
.Nm msync
.Nd synchronize a mapped region
.Sh SYNOPSIS
- .Fd #include <sys/types.h>
.Fd #include <sys/mman.h>
.Ft int
! .Fn msync "caddr_t addr" "size_t len"
.Sh DESCRIPTION
The
.Fn msync
! system call
! writes any modified pages back to the filesystem and updates
! the file modification time.
If
.Fa len
is 0, all modified pages within the region containing
--- 33,56 ----
.\"
.\" @(#)msync.2 8.1 (Berkeley) 6/9/93
.\"
! .Dd October 10, 1997
.Dt MSYNC 2
! .Os NetBSD 1.3
.Sh NAME
.Nm msync
.Nd synchronize a mapped region
.Sh SYNOPSIS
.Fd #include <sys/mman.h>
.Ft int
! .Fn msync "void *addr" "size_t len" "int flags"
.Sh DESCRIPTION
The
.Fn msync
! system call writes all pages with shared modifications
! in the specified
! region of the process's address space back to permanent
! storage, and, if requested, invalidates cached data mapped
! in the region.
If
.Fa len
is 0, all modified pages within the region containing
***************
*** 57,95 ****
will be flushed;
if
.Fa len
! is non-zero, only the pages containing
.Fa addr
and
.Fa len
! succeeding locations will be examined.
Any required synchronization of memory caches
will also take place at this time.
Filesystem operations on a file that is mapped for shared modifications
are unpredictable except after an
.Fn msync .
.Sh ERRORS
The following errors may be reported:
.Bl -tag -width Er
! .It Bq Er EINVAL
The
! .Fa addr
! parameter was not page aligned.
.It Bq Er EINVAL
! The
! .Fa addr
! parameter did not specify an address part of a mapped region.
.It Bq Er EINVAL
The
! .Fa len
! parameter was negative.
.It Bq Er EIO
! An I/O error occured while writing to the file system.
.Sh SEE ALSO
! .Xr madvise 2 ,
! .Xr munmap 2 ,
! .Xr mprotect 2 ,
! .Xr mincore 2
.Sh HISTORY
The
.Fn msync
! function first appeared in 4.4BSD.
--- 58,123 ----
will be flushed;
if
.Fa len
! is non-zero, only modified pages containing
.Fa addr
and
.Fa len
! succeeding locations will be flushed.
Any required synchronization of memory caches
will also take place at this time.
Filesystem operations on a file that is mapped for shared modifications
are unpredictable except after an
.Fn msync .
+ .Pp
+ The
+ .Fa flags
+ argument is formed by
+ .Em or Ns 'ing
+ the following values
+ .Pp
+ .Bd -literal -offset indent -compact
+ MS_ASYNC Perform asynchronous writes.
+ MS_SYNC Perform synchronous writes.
+ MS_INVALIDATE Invalidate cached data after writing.
+ .Ed
.Sh ERRORS
The following errors may be reported:
.Bl -tag -width Er
! .It Bq Er EBUSY
The
! .Dv MS_INVALIDATE
! flag was specified and a portion of the specified region
! was locked with
! .Xr mlock 2 .
.It Bq Er EINVAL
! The specified
! .Fa flags
! argument was invalid.
.It Bq Er EINVAL
The
! .Fa addr
! parameter was not page aligned.
.It Bq Er EIO
! An I/O error occured while writing.
! .It Bq Er ENOMEM
! Addresses in the specified region are outside the range allowed
! for the address space of the process, or specify one or more pages
! which are unmapped.
.Sh SEE ALSO
! .Xr mlock 2 ,
! .Xr mmap 2 ,
! .Xr munlock 2
! .Sh BUGS
! Writes are currently done synchronously even if the
! .Dv MS_ASYNC
! flag is specified.
.Sh HISTORY
The
.Fn msync
! function first appeared in
! .Bx 4.4 .
! It was modified to conform to
! .St -p1003.1b-93
! in
! .Nx 1.3 .
! .P
>Audit-Trail:
>Unformatted: