tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: How to identify specific wait-state for a "DE" process?



This scenario reminds me of:

https://www.sqlite.org/compile.html#minimum_file_descriptor


-bch


On 1/5/16, Paul Goyette <paul%vps1.whooppee.com@localhost> wrote:
> On Wed, 6 Jan 2016, Paul Goyette wrote:
>
>> I need to figure out why this is a problem when filemon(4) "borrows" the
>> fd
>> for stdout, but is not a problem when it borrows a real file.
>
> OK, I figured out what's going on.
>
> In the failure scenario, we have the following events:
>
>  	1. Process opens /dev/filemon and gets fd #3
>  	2. Process tells filemon to log activity to fd #1 (stdout)
>  	3. Process calls sys_exit(), which starts process cleanup
>  	4. Clean-up code tries to fd_close all open descriptors, in
>  	   order, so handles fd #0 and then fd #1
>  	5. fd #1 has another reference, so we wait on the condvar,
>  	   which never gets broadcast since there's no other thread
>  	   to run.  We hang here forever.
>
> In the success scenario, we have a slightly different sequence:
>
>  	1. Process opens /dev/filemon and gets fd #3
>  	2. Process opens up a temp file (or simply calls dup(stdout))
>  	   and gets fd #4;  the process tells filemon to log activity
>  	   to fd #4
>  	3. Process calls sys_exit(), which starts process cleanup
>  	4. Clean-up code tries to fd_close all open descriptors, in
>  	   order, so handles fd #0 and then fd #1
>  	5. In this scenario, fd#1 has no extra references, so it can
>  	   close normally.
>  	6. Cleanup proceeds with fd #2, and then gets to fd#3, where
>  	   /dev/filemon is open
>  	7. We call filemon_close() which calls fd_putfile() on fd #4.
>  	   This removes the additional reference on fd #4
>  	8. Cleanup moves on to fd #4 which now has only a single
>  	   reference, so it, too, can be successfully closed!
>
> As long as the /dev/filemon file descriptor is numerically smaller than
> the logging fd, it gets closed first, and everything works fine.  But we
> will hang if we try to close the logging file first because of the extra
> reference.
>
> Does anyone have any good suggestions for how to arrange for another
> thread/lwp to run so it can remove the extra reference to the logging
> descriptor?
>
>
> +------------------+--------------------------+------------------------+
> | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
> | (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
> | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
> +------------------+--------------------------+------------------------+
>


Home | Main Index | Thread Index | Old Index