> On May 7, 2020, at 5:47 PM, Taylor R Campbell <campbell+netbsd-source-changes-d%mumble.net@localhost> wrote: > >> Module Name: src >> Committed By: hannken >> Date: Thu May 7 09:12:03 UTC 2020 >> >> Modified Files: >> src/external/cddl/osnet/dist/uts/common/fs/zfs: zfs_vnops.c >> >> Log Message: >> Revert Rev. 1.63 and add a comment why we have to zil_commit() here: >> >> Operation zfs_znode.c::zfs_zget_cleaner() depends on this >> zil_commit() as a barrier to guarantee the znode cannot >> get freed before its log entries are resolved. > > We must be doing something wrong. > > The only times we should ever call zil_commit are when someone called > fsync or the file system is mounted sync=always. Calling zil_commit > whenever we delete a file absolutely wrecks performance and shouldn't > be needed for on-disk correctness in normal zfs semantics unless I > terribly misunderstand something, so we must be doing something wrong > with the in-memory state if we seem to need this. The problem is the way we get data in zfs_get_data(). Other OS use zfs_zget() to obtain a referenced vnode/znode and use it to retrieve tha data. For us this results in deadlocks either as locking-against-self if called during vcache_reclaim() or more difficult deadlocks as another operation needs to proceed and we currently have txg stopped from syncing. Chuq resolved it with zfs_zget_cleaner() that doesn't reference the vnode/znode and works for in-memory znodes only so we have to guarantee it doesn't get called after VOP_RECLAIM(). > Do you have a test case that can trigger the problem? Under load I get use-after-free of znodes in zfs_get_data() or entirely miss znodes here. See the other commit asserting zfs_zget_cleaner() never fails. -- J. Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig (Germany)
Attachment:
signature.asc
Description: Message signed with OpenPGP