Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Testing/Debugging amdgpu kernel driver in 10.0_BETA
On Fri, Dec 30, 2022 at 2:02 PM Jeff Frasca <thatguy%jeff-frasca.name@localhost> wrote:
>
> Hi all, I have a lenovo thinkpad E485 which has a Ryzen 3 2200U APU
> in it (which is a Vega/RAVEN RIDGE GPU). I installed the NetBSD 10.0
> BETA on it this week, and that's actually working better than
> Slackware 15.0 (which just refuses to boot). Everything seems to be
> working reasonably well, and X11 runs with the framebuffer drivers and
> llvmpipe. However, that only gets me 1024x768, and I know there's a
> big chunk of code sitting in the src tree that has the real drivers
> for it.
>
> I've got three kernels installed right now. The GENERIC kernel from
> the install, a custom kernel that just has the drivers I need and is
> setup to run with the framebuffer drivers. Both of those seem to work
> equally well. The third is another custom config that is the same as
> the working custom kernel, it just has the amdgpu drivers configured.
>
> That third kernel panics on boot in one of two ways:
>
> On cold boot, I get an error "PSP load tmr failed!"
>
> On a warm boot (after first loading any of the three kernels), I get
> an ETIMEDOUT error from the ring_test function in the driver.
>
> In both cases, that isn't the actual panic, it's a failed
> 'cv_is_valid()' assertion in a call to 'cv_destroy()' in
> 'drm_sched_fini()'. I think both of those errors are causing the
> 'amdgpu_driver_load_kms()' to fail, emit a "Fatal error during GPU
> init" and then jump to an error block which calls
> 'amdgpu_driver_unload_kms()' which then panics while trying to unload
> the driver.
After reading a bunch of code this afternoon, I think the cv_destroy()
call is failing because the ring structures never get initialized.
I'm pretty sure my APU uses the functions in amdgpu_vcn.c, and that
doesn't have any calls to amdgpu_init_ring(), which shows up in a lot
of the other files. Hopefully in the next day or two I figure out
some code changes to try. Then I can move onto the next bug. (Maybe
the firmware doesn't load on cold boot, maybe something hiding behind
the missing initialization bug. Probably the latter.)
> Anyone have any tips on how to further debug this? I'm going to keep
> poking at it on my own, but any suggestions on where to look, things
> to try, I would love to hear them. Is there any more information I
> can/should send? Is there a good way to save backtraces and dmesg
> rather than typing it into an email manually on a different computer?
> Any cute ddb tricks that would help? I know this is going to need
> code changes (which I will try to work out), is there a better mailing
> list to send this to?
>
> (I'm typing this from a Kaveri APU based desktop that's dual booting
> Slackware 15 and NetBSD 9.99.x right now, if I make headway on the
> laptop, the desktop will get 10.0 next and I'll try and get bugs in
> its graphics drivers tested and ironed out.)
>
> Thanks!
> Jeff
Home |
Main Index |
Thread Index |
Old Index