tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: fd code multithreaded race?
On Wed Aug 04 2010 at 13:21:07 +0000, Andrew Doran wrote:
> On Sat, Jul 31, 2010 at 08:31:19PM +0300, Antti Kantee wrote:
> > Hi,
> >
> > I'm looking at a KASSERT which is triggering quite rarely for me (in
> > terms of iterations):
> >
> > panic: kernel diagnostic assertion "dt->dt_ff[i]->ff_refcnt == 0" failed:
> > file
> > "/usr/allsrc/src/sys/rump/librump/rumpkern/../../../kern/kern_descrip.c",
> > line 856
> >
> > Upon closer examination, it seems that this can trigger while another
> > thread is in fd_getfile() between upping the refcount, testing for
> > ff_file, and fd_putfile(). Removing the KASSERT seems to restore correct
>
> You're right there, the KASSERT() is wrong, it should be removed.
Thanks, I'll do that.
> > operation, but I didn't read the code far enough to see where the race
> > is actually handled and what stops the code from using the wrong file.
>
> FYI the fdfile_t (per-descriptor records) are stable for the lifetime of the
> process, what each record descibes can and does of course change, and how
> those records are pointed to does change (fdtab_t).
>
> There isn't really a concept of "wrong file", as in, the app gets
> what it asked for. It is free to ask for the wrong thing, and it's free
> to ask for the right thing at the wrong time, etc - that's its problem.
>
> Unless you're alluding to another bug?
Not really. I just started thinking about how applications can make
sure they use the right file descriptor. It seems using close() to
notify other threads of a file descriptor being closed is racy.
So something naiive like this:
t1: lock
t1: get fd1
t1: unlock
/* t1 wants to do a syscall with fd1 but is preempted */
t2: lock
t2: close fd1
t2: unlock
t3: lock
t3: open, result fd1
t3: unlock
t1: syscall fd1 ...
will give you the wrong result. Essentially there is no interlock from
the application lookup to the kernel backing object lookup.
So I guess if you want things to work correctly, instead of close()
you need to dup2() to a zombie/"deadfs" fd and wait for all threads to
check in before you can close it. (i assume dup2 is atomic)
Never realized file descriptors and threads were so tricky ;)
Home |
Main Index |
Thread Index |
Old Index