Subject: Re: clone()
To: Andrew Doran <ad@netbsd.org>
From: David Holland <dholland+netbsd@eecs.harvard.edu>
List: tech-kern
Date: 09/16/2007 00:47:29
On Sun, Sep 16, 2007 at 02:47:59AM +0100, Andrew Doran wrote:
> The clone() syscall means that we need to share some data structures among
> processes like the limits. Those structures can also be shared among LWPs
> within those processes, and are usually subject to some kind of COW scheme,
> so the reference counting and locking gets complicated.
>
> A while back rmind@ suggested changing changing clone() to create new LWPs
> within the same process (instead of creating a new process), and on the face
> of it that makes a lot of sense. It would remove some the complexity around
> sharing those data structures.
Not really that much, though, because the data is ultimately still
shared, and so it still needs to be locked. The most one might gain is
having fewer distinct locks; but I'm not sure that's a great idea,
from either a maintainability or a performance/scalability standpoint.
It would allow abolishing refcounts for (some) things that struct proc
points to, but I'm not convinced this is worthwhile. (Note that the
limit structures, specifically, are shared among even non-threaded
procs to avoid tons of copying. I thought about abolishing this
optimization, but decided it was probably better to keep it. After
all, limits are rarely changed.)
Meanwhile, there's a problem: both clone() and the Irix sproc() are
variable-weight forks that allow choosing what you want to share. That
is, you can choose to share the address space but not the file table,
or vice versa, or whatever. (With sproc() you can apparently fork a
batch of processes that shares modifications to only their limits, if
for some mad reason you wanted to. The Linux code doesn't allow limit
sharing.)
Since the whole point of lwps, at least in this context, is that they
always share the proc structure and thus always share all these pieces
without having to refcount them independently, I'm pretty sure it
won't work.
--
- David A. Holland / dholland+netbsd@eecs.harvard.edu