> On 27. Sep 2018, at 09:44, Manuel Bouyer <bouyer%antioche.eu.org@localhost> wrote: > > On Thu, Sep 27, 2018 at 01:03:40AM +0000, Emmanuel Dreyfus wrote: >> Module Name: src >> Committed By: manu >> Date: Thu Sep 27 01:03:40 UTC 2018 >> >> Modified Files: >> src/sys/kern: vfs_trans.c >> >> Log Message: >> Work around deadlock between fstchg and fstcnt >> >> When suspending a filesystem in fstrans_setstate(), we wait on >> fstcnt for threads to finish transactions. While we do this, any >> thread trying to start a filesystem transaction will wait on fstchg >> in fstrans_start(), a situation which can deadlock. >> >> The wait for fstcnt in fstrans_setstate() can be interrupted by >> a signal, but the wait for fstchg in fstrans_start() cannot. Once >> most processes are stuck in fstchg, it is impossible to send a >> signal to the thread that waits on fstcnt, because no process >> respond anymore to user input. >> >> We fix that by adding a timeout to the wait on fstcnt in >> fstrans_setstate(). This means suspending a filesystem may fail, >> but it was already the case when the sleep was interupted by >> a signal, hence calling function must already handle a possible >> failure. > > Actually callers do not, they just forward the failure. > This means that things like e.g. umount or snapshots will randomly > fail (whenever fstrans_setstate() can actually take a long > time without deadlock has to be determined, but I suspect it can). > > I think fstrans_setstate() should pause and retry in this case. As Manuel said ... This change is wrong, there is no upper time limit for a successfull file system suspension. Please revert ASAP. I will take a look into this PR next week. -- J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig
Attachment:
signature.asc
Description: Message signed with OpenPGP