tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: How to identify specific wait-state for a "DE" process?
This scenario reminds me of:
https://www.sqlite.org/compile.html#minimum_file_descriptor
-bch
On 1/5/16, Paul Goyette <paul%vps1.whooppee.com@localhost> wrote:
> On Wed, 6 Jan 2016, Paul Goyette wrote:
>
>> I need to figure out why this is a problem when filemon(4) "borrows" the
>> fd
>> for stdout, but is not a problem when it borrows a real file.
>
> OK, I figured out what's going on.
>
> In the failure scenario, we have the following events:
>
> 1. Process opens /dev/filemon and gets fd #3
> 2. Process tells filemon to log activity to fd #1 (stdout)
> 3. Process calls sys_exit(), which starts process cleanup
> 4. Clean-up code tries to fd_close all open descriptors, in
> order, so handles fd #0 and then fd #1
> 5. fd #1 has another reference, so we wait on the condvar,
> which never gets broadcast since there's no other thread
> to run. We hang here forever.
>
> In the success scenario, we have a slightly different sequence:
>
> 1. Process opens /dev/filemon and gets fd #3
> 2. Process opens up a temp file (or simply calls dup(stdout))
> and gets fd #4; the process tells filemon to log activity
> to fd #4
> 3. Process calls sys_exit(), which starts process cleanup
> 4. Clean-up code tries to fd_close all open descriptors, in
> order, so handles fd #0 and then fd #1
> 5. In this scenario, fd#1 has no extra references, so it can
> close normally.
> 6. Cleanup proceeds with fd #2, and then gets to fd#3, where
> /dev/filemon is open
> 7. We call filemon_close() which calls fd_putfile() on fd #4.
> This removes the additional reference on fd #4
> 8. Cleanup moves on to fd #4 which now has only a single
> reference, so it, too, can be successfully closed!
>
> As long as the /dev/filemon file descriptor is numerically smaller than
> the logging fd, it gets closed first, and everything works fine. But we
> will hang if we try to close the logging file first because of the extra
> reference.
>
> Does anyone have any good suggestions for how to arrange for another
> thread/lwp to run so it can remove the extra reference to the logging
> descriptor?
>
>
> +------------------+--------------------------+------------------------+
> | Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
> | (Retired) | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
> | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
> +------------------+--------------------------+------------------------+
>
Home |
Main Index |
Thread Index |
Old Index