Subject: None
To: Ty Sarna <tsarna@endicor.com>
From: Alistair G. Crooks <azcb0@uts.amdahl.com>
List: current-users
Date: 12/18/1995 04:08:17
Sorry in advance for the length of this message - I thought some
others might be interested...
> In article <199512172035.OAA00797@sierra.zyzzyva.com>,
> Randy Terbush <randy@zyzzyva.com> wrote:
> > Would it be possible for one of those with access to sync up
> > the libpthreads library to the latest release, and include
> > this in the src/lib/Makefile for compile? I would like to
> > seriously look at using this for the Java port, and would like
> > to know that there is some group support for libpthreads.
>
> Has anyone looked at whoever-it-was's rfork() and lightweight locking
> primitives support (for FreeBSD, I think, but I think he wrote a message
> to one of teh NetBSD lists about it, indicating a willingness to port
> it), and doing threads on top of that (may have already been done by
> him). Kernel-based tread support seems like it would be much better than
> the simulation NetBSD's libpthreads provides, clever though it may be.
It's plan9's rfork, and I mentioned it to current-users in October:
> 4. There's a whole lot of discussion going on on the FreeBSD hacker's
> list with respect to kernel threads, and implementations of Plan9's
> rfork mechanism (upon which the smart money seems to be betting).
>
> > From: "Ron G. Minnich" <rminnich@Sarnoff.COM>
> > Date: Fri, 20 Oct 1995 09:10:43 -0400 (EDT)
> > Subject: Re: NetBSD/FreeBSD (pthreads)
> >
> > I implemented a simple version of plan9 rfork() a little while ago (well,
> > a year ago). You could rfork and end up with shared data space and file
> > table. I also implemented a very simple lock/unlock primitive that was
> > far more efficient than system v semaphores, since in the common case
> > (no contention for a lock) there's no jump to the kernel to wake up
> > other procs when you acquire or free a lock. Between these two things you
> > can do a lot: share data, share locked structures, share open files, etc.
> > I can do all of what i commonly do with kernel threads on, e.g., Irix. In
> > fact I implemented a simple user-space distributed shared memory with
> > these basic parts: on sgi's i used their kernel threads/kernel mutex
> > code, on freebsd i used rfork/lock code i built.
> >
> > For my money this is about as good as kernel threads. There's not the
> > additional complexity in the kernel (have you ever seen what LWP did to
> > sunos? No? good.).
> >
> > This code has been available gratis for a year. I can't convince anyone
> > to pull it into core for netbsd or freebsd, but I'll make the offer
> > again: you want it, let me know. The code, btw, is less than 100 lines
> > for each change. In fact the fastlock code is something like 25 lines.
> > I've implemented them as LKMs and directly as part of the kernel.
> >
> > ron
> >
> > Ron Minnich |Like a knife through Daddy's heart:
> > rminnich@earth.sarnoff.com |"Don't make fun of Windows, daddy! It takes care
> > (609)-734-3120 | of all my files and it's reliable and I like it".
>
> [The Plan 9 folks report that rfork is a win - there are very rarely
> two occurences of rfork in their code with the same resource flags.
> For more information on the Plan9 stuff, see
> http://plan9.att.com/plan9/doc/9.html
> And if you're interested in Plan9, there's an interesting effort
> called VSTa, that does a lot of Plan9y things. GPLed, though. If
> you're interested, mail me for more info. -agc ]
[And, as another aside, I have seen what LWP did to SunOS, and I was
not impressed.]
To someone else who wondered when symlinks arrived, I'm fairly sure it
was 4.2 - 4.1c manual pages certainly didn't have anything about them,
as I was chastened to find out after coming out worse in a news
confrontation with Guy Harris (those were the days...)
More information on the actual implementation of rfork came from the
author in 3 separate messages:
> From: "Ron G. Minnich" <rminnich@Sarnoff.COM>
> Date: Wed, 25 Oct 1995 14:43:02 -0400 (EDT)
> Subject: anatomy of rfork, part 1: minherit
>
> i've had enough q's on this, and time is tight, so i thought i'd just put
> out a few messages on how to do rfork. The code is small, so bear with
> me.
>
> To do rfork as i needed it, you really need two parts to start with: a
> way to share data after fork and a way to share file tables after fork.
> AIX/370 implemented DCE threads with these two things. I thought i'd show
> minherit first. I don't know the plan9 environment erasing stuff,
> although that is pretty easy to add -- could be useful.
>
> minherit is shown below. Calls are much like mprotect:
> minherit(caddr, len, new inherit values)
>
> Look in vm/vm_inherit.h
>
> All you need to do is take the mprotect call code and redo it just a bit
> so it calls vm_map_inherit. Here we go:
>
> struct mprotect_args {
> caddr_t addr;
> int len;
> int inherit;
> };
> int
> minherit(p, uap, retval)
> struct proc *p;
> struct mprotect_args *uap;
> int *retval;
> {
> vm_offset_t addr;
> vm_size_t size;
> register vm_inherit_t inherit;
>
> #ifdef DEBUG
> printf("minherit(%d): addr %x len %x prot %d\n",
> p->p_pid, uap->addr, uap->len, uap->inherit);
> #endif
>
> addr = (vm_offset_t)uap->addr;
> if ((addr & PAGE_MASK) || uap->len < 0)
> return(EINVAL);
> size = (vm_size_t)uap->len;
> inherit = uap->inherit;
>
> switch (vm_map_inherit(&p->p_vmspace->vm_map, addr, addr+size,
> inherit)) {
> case KERN_SUCCESS:
> #ifdef DEBUG
> printf("works\n");
> #endif
> return (0);
> case KERN_PROTECTION_FAILURE:
> #ifdef DEBUG
> printf("fails\n");
> #endif
> return (EACCES);
> }
> #ifdef DEBUG
> printf("return einval\n");
> #endif
> return (EINVAL);
> }
>
>
> ------------------------------
>
> From: "Ron G. Minnich" <rminnich@Sarnoff.COM>
> Date: Wed, 25 Oct 1995 15:02:53 -0400 (EDT)
> Subject: Re: anatomy of rfork, part 2: fork code
>
> This one is really easy. Basically you have to mod the fork code to take
> an option that indicates whether you dup the open file table for the
> process or simply bump the use count and use it for the child. The segment
> inheritance management has been done at this point: it gets done in user
> mode via the minherit() i showed in the previous note. I delete the middle
> parts that don't change ... it's about 10 lines of difference from a
> regular fork.
>
> Points to note: parameter from user mode, which if has bit 0x80 set,
> means 'dup the file table'. SO i set the dupfd variable at the beginning.
> At the end, code decides to either dupfd() or just bump counters. Note in
> include sys/vnode.h, and to make it work correctly, i have to under
> KERNEL before and redefine it after the include. Ah well ... i think this
> oughtta get fixed somehow.
>
> note the implication of the option: fork is a special case of rfork.
>
> /*
> * Copyright (c) 1982, 1986, 1989, 1991, 1993
> * The Regents of the University of California. All rights reserved.
> * (c) UNIX System Laboratories, Inc.
> * All or some portions of this file are derived from material licensed
> * to the University of California by American Telephone and Telegraph
> * Co. or Unix System Laboratories, Inc. and are reproduced herein with
> * the permission of UNIX System Laboratories, Inc.
> *
> * Redistribution and use in source and binary forms, with or without
> * modification, are permitted provided that the following conditions
> * are met:
> * 1. Redistributions of source code must retain the above copyright
> * notice, this list of conditions and the following disclaimer.
> * 2. Redistributions in binary form must reproduce the above copyright
> * notice, this list of conditions and the following disclaimer in the
> * documentation and/or other materials provided with the distribution.
> * 3. All advertising materials mentioning features or use of this software
> * must display the following acknowledgement:
> * This product includes software developed by the University of
> * California, Berkeley and its contributors.
> * 4. Neither the name of the University nor the names of its contributors
> * may be used to endorse or promote products derived from this software
> * without specific prior written permission.
> *
> * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
> * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
> * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> * SUCH DAMAGE.
> *
> * @(#)kern_fork.c 8.6 (Berkeley) 4/8/94
> */
>
> #include <sys/param.h>
> #include <sys/systm.h>
> #include <sys/filedesc.h>
> #include <sys/kernel.h>
> #include <sys/malloc.h>
> #include <sys/proc.h>
> #include <sys/resourcevar.h>
> #include <sys/file.h>
> #include <sys/acct.h>
> #include <sys/ktrace.h>
>
> /* oh, yuck */
> /* this is due to include of vnode_if.h, which is automatically
> * generated. ouch.
> */
> #undef KERNEL
> #include <sys/vnode.h>
> #define KERNEL
> #define VREF(vp) (vp)->v_usecount++ /* increase reference */
>
> struct rfa { int opts; };
> /* ARGSUSED */
> rfork(p1, uap, retval)
> struct proc *p1;
> struct rfa *uap;
> int retval[];
> {
> int dupfd = 0; /* added for rfork() */
>
> register struct proc *p2;
> register uid_t uid;
> struct proc *newproc;
> struct proc **hash;
> int count;
> static int nextpid, pidchecked = 0;
>
> if (uap->opts&0x80)
> dupfd = 1;
>
>
> /* DUPLICATE FORK CODE DELETED HERE ... */
> .
> .
> .
> /* END DELETED FORK CODE */
> /* bump references to the text vnode (for procfs) */
> p2->p_textvp = p1->p_textvp;
> if (p2->p_textvp)
> VREF(p2->p_textvp);
>
> /* BEGIN CHANGED CODE FOR RFORK FOR DUPFD () */
> if (dupfd)
> p2->p_fd = fdcopy(p1);
> else
> {
> /* make this a function at some point */
> /* danger!!! no locks!!! */
> p2->p_fd = p1->p_fd;
> p2->p_fd->fd_refcnt++;
> }
> /* END CHANGED CODE FOR RFORK() */
> /* MORE DELETED UNCHANGED CODE */
> /* END DELETED CODE */
> /*
> * Return child pid to parent process,
> * marking us as parent via retval[1].
> */
> retval[0] = p2->p_pid;
> retval[1] = 0;
> return (0);
> }
>
> ------------------------------
>
> From: "Ron G. Minnich" <rminnich@Sarnoff.COM>
> Date: Wed, 25 Oct 1995 15:27:15 -0400 (EDT)
> Subject: rfork part 3: library code
>
> All this function does is:
> 1)minherit the data space
> 2) call rfork with zero as the options value
> 3) return values. Only funniness is that you have to fake the return 0
> to kid behavior of fork(), so there's fooling around with getpid() before
> the call and testing of return values after the call.
>
> Also, there's a call to something called 'syscallfind' in here for the
> modload case. IF anyone wants that code let me know. It uses modstat code
> to find the named syscall number.
>
> There you are. rfork in 3 parts. Questions to me.
>
> ron
>
> #include <stdio.h>
> #include <sys/param.h>
> #include <vm/vm.h>
> #include <vm/vm_inherit.h>
>
> int minherit(caddr_t, unsigned int, int);
>
> int
> rfork(int i)
> {
> extern int end, sbrk();
> int pid, newpid;
> /* until it's a real syscall, we have to fake the zero-return */
> unsigned long start, last;
> static int rfsyscallnum = -1;
>
> if (rfsyscallnum < 0)
> rfsyscallnum = syscallfind("rfork");
> if (rfsyscallnum < 0) {
> perror("rfork syscallfind");
> return -1;
> }
>
> /* for the modload version, we don't get two return values,
> * so we have to fake the fork 'return 0 to kid' behavior
> */
> pid = getpid();
>
> start = (unsigned long) ctob(btoc(&end));
> last = sbrk(0);
> /* the man page lies:
> * it won't return page-aligned values from sbrk
> * the seg is actually several pages larger!
> */
> last = ctob(btoc(last)+4);
> /* may be nothing to share, ignore return errors */
>
> if (minherit(start, last-start, VM_INHERIT_SHARE) < 0)
> perror("minherit failed");
>
> newpid = syscall(rfsyscallnum,i);
>
> if (newpid == pid)
> newpid = 0;
> return newpid;
> }
>
>
> Ron Minnich |Like a knife through Daddy's heart:
> rminnich@earth.sarnoff.com |"Don't make fun of Windows, daddy! It takes care
> (609)-734-3120 | of all my files and it's reliable and I like it".
I suppose this just muddies the waters somewhat, but it would be nice
to have, even if just in an OPTIONS_RFORK kernel config option.
Cheers,
Alistair
--
Alistair G. Crooks (agc@uts.amdahl.com) +44 125 234 6377
Amdahl European HQ, Dogmersfield Park, Hartley Wintney, Hants RG27 8TE, UK.
[These are only my opinions, and certainly not those of Amdahl Corporation]