On 7/23/23 17:27, PHO wrote:
On 7/22/23 22:41, Taylor R Campbell wrote:Date: Sat, 22 Jul 2023 21:52:40 +0900 From: PHO <pho%cielonegro.org@localhost> Jul 17 00:52:34 netbsd-current /netbsd: [ 64017.6151161] vmw_fence_wait() at netbsd:vmw_fence_wait+0xdcJust to confirm, what does `info line *(vmw_fence_wait+0xdc)' say in gdb? And, if you can get to the frame in gdb, what does gdb say &cb.wq is in the vmw_fence_wait frame, and what cv is in the cv_destroy frame? Let's confirm it is the cv you think it is -- I suspect it might be a different one.I just encountered the crash and could obtain a crash dump. It is indeed the "DRM_DESTROY_WAITQUEUE(&cb.wq)" in vmw_fence_wait() but the contents of cb does not make sense to me:
...
CV_SLEEPQ(cv) is 0x01 (wtf) and CV_WMESG(cv) is not even a string?
I realized the cause of this:static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long timeout)
{ ... if (likely(vmw_fence_obj_signaled(fence))) return timeout; ... spin_lock(f->lock); if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &f->flags)) goto out; // <-- THIS ONE if (intr && signal_pending(current)) { ret = -ERESTARTSYS; goto out; // <-- OR THIS } #ifdef __NetBSD__ DRM_INIT_WAITQUEUE(&cb.wq, "vmwgfxwf"); #else cb.task = current; #endif ... out: spin_unlock(f->lock); #ifdef __NetBSD__ DRM_DESTROY_WAITQUEUE(&cb.wq); #endif ... }There were cases where the function was destroying a condvar that it didn't initialize! Ugh, this is the very reason why I dislike C...