On 7/23/23 21:24, Taylor R Campbell wrote:
Date: Sun, 23 Jul 2023 12:36:10 +0900 From: PHO <pho%cielonegro.org@localhost> This: https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L45 and this: https://github.com/depressed-pho/netbsd-src/blob/91daa67f17222da355d3fddd6fa849c786d9c545/sys/external/bsd/drm2/dist/drm/vmwgfx/vmwgfx_fence.c#L527 vma_fence_manager::fence_list holds pointers to all the fence objects in use but the list doesn't retain their refcount. I tried to change the way how it works by simply incrementing the refcount in vmw_fence_obj_init() and decrementing it in __vmw_fences_update(). But it didn't work well because fences could also be released by dma_resv_add_excl_fence() and dma_resv_add_shared_fence(). I cannot recall all the details but sometimes refcounts went down to zero even if they were retained by fence_list, and that's when I gave up fighting against it. I admit I don't fully understand the code.Why can't the loop just use dma_fence_get_rcu and dma_fence_put? Might need to release and reacquire the lock around dma_fence_put.
I tried but it didn't work, apparently because someone was holding a pointer to a dying fence and waiting for it to be signaled. Using dma_fence_get_rcu() meant we didn't signal such fences, and the kernel hanged and eventually got killed by heartbeat().
If that doesn't work for some reason and we do have to add a special case, I'd strongly prefer to add a new front end like dma_fence_signal_vmwgfxspecialhack or something (other examples in the tasklet API) that has the relaxed assertion instead of relaxing the assertions for every other driver which isn't bonkers.
Alright. I'll do that.