https://bugs.freedesktop.org/show_bug.cgi?id=112242

            Bug ID: 112242
           Summary: amdgpu [RX Vega 56]: ring sdma0 timeout
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: not set
         Component: DRM/AMDgpu
          Assignee: [email protected]
          Reporter: [email protected]

Hi,

I've reported this over at bugzilla.kernel.org but didn't get any help there.
Maybe because nobody is expecting bugreports about the amdgpu driver over on
the kernels bugtracker?

So this started a while ago, when I updated from 5.0.0 to a newer kernel. I'm
currently at 5.3.0 and for almost any game I play I run into this problem:

Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=368056, emitted seq=368057
Aug 24 11:13:33 egalite kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process 7DaysToDie.x86_ pid 8108 thread 7DaysToDie:cs0
Aug 24 11:13:33 egalite kernel: amdgpu 0000:0c:00.0: GPU reset begin!
Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, but soft recovered

Only a hard reset made me recover from that.

I did some kernel traces which I will copy over to this report, if necessary,
but for now you can download them here:
https://bugzilla.kernel.org/show_bug.cgi?id=204683

It also looks a bit like this bug:
https://bugzilla.kernel.org/show_bug.cgi?id=201957 , because I also get the
"ring gfx timeout". And there are lots and lots of people having this issue.

I tried bisecting it, but failed, because either I missed the commit that
causes this, because there are multiple reasons why this happens or this really
goes way back to the time, where 4.18 was the base for drm-next (which doesn't
compile on modern compilers anymore. Also steam doesn't want to run on those
old kernels, so even when I was able to compile an older kernel, there was no
way to test them)

I even tried debugging it over ethernet (KGDBoE is a nice thing if you need
performance), but somehow this slowed everything down enough to not trigger the
bug.

I also tried the suggestions from
https://bugs.freedesktop.org/show_bug.cgi?id=109955, but forbidding the lowest
clock mode doesn't help either. (It fixes my RocketLeague problems, though).

Please advise what I should try next.

Best regards
Matthias

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to