[Public] > From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Christian > König > Sent: Tuesday, May 27, 2025 8:24 PM > To: Khatri, Sunil <sunil.kha...@amd.com>; amd-gfx@lists.freedesktop.org; > Deucher, > Alexander <alexander.deuc...@amd.com> > Subject: Re: [PATCH] Revert "drm/amdgpu: promote the implicit sync to the > dependent read fences" > > On 5/27/25 13:27, Khatri, Sunil wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > -----Original Message----- > > From: Koenig, Christian <christian.koe...@amd.com> > > Sent: Tuesday, May 27, 2025 2:32 PM > > To: Khatri, Sunil <sunil.kha...@amd.com>; > > amd-gfx@lists.freedesktop.org; Deucher, Alexander > > <alexander.deuc...@amd.com> > > Subject: Re: [PATCH] Revert "drm/amdgpu: promote the implicit sync to the > dependent read fences" > > > > On 5/27/25 10:58, Sunil Khatri wrote: > >> This reverts commit 714bbbf20a7266e48632fab466563e695af9acb5. > >> bisected to this change which is causing the flikering issue in the > >> UI for various apps like glxgears and unigen heaven. > > > > Is that flickering also there when using kernel queues? > > > > If not then without an explanation where that flickering is coming from for > > user > queues I have to reject that. > > > > There is more to just flickering here. We are planning to use > amdgpu.user_queue=1 in our CI testing. That means both kernel and user > submission are enabled. When I ran the testing amdgpu_basic tests which runs > both > kernel and user submissions, all the user queue tests are consistently > failing. With > this revert change in place all those issues are fixed, Both user and kernel > queue > tests pass consistently without any artifacts. > > That sounds like we are missing some dependency and that works only by > coincident. > > > The reason of why this is helping is I am not sure of but it’s the > > observation which > was observed earlier by Arvind too. We could investigate the probable reason > of it > infact Arvind is checking that but to enable CI we could revert this if you > agree. > > I think figuring out what is missing here is more important than enabling CI. > > We most likely incorrectly sync to TLB flush fences on the kernel queues now > and > that could not only have negative performance side effects but also hide real > bugs > we need to fix ASAP. Maybe there're some bookkeeping fences that miss explicit sync when the driver doesn't sync them implicitly. I will further investigate it as well. If we need to revert the patch as a workaround, we need the following https://www.spinics.net/lists/amd-gfx/msg123346.html to handle the eviction fence leakage issue.
Thanks, Prike > Regards, > Christian. > > > > > Regards > > Sunil Khatri > > > > It seems to be the right thing TODO and we are still in the process of > > hammering > out all the bugs for user queues. > > > > Regards, > > Christian. > > > >> Also when we set user_queue=1, which enable both user and kernel > >> submissions the userqueue tests in IGT also fail. > >> > >> Signed-off-by: Sunil Khatri <sunil.kha...@amd.com> > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c > >> index d6ae9974c952..5576ed0b508f 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c > >> @@ -249,8 +249,9 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, > >> struct amdgpu_sync *sync, > >> > >> if (resv == NULL) > >> return -EINVAL; > >> - /* Implicitly sync only to KERNEL, WRITE and READ */ > >> - dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_READ, f) { > >> + > >> + /* TODO: Use DMA_RESV_USAGE_READ here */ > >> + dma_resv_for_each_fence(&cursor, resv, > DMA_RESV_USAGE_BOOKKEEP, > >> + f) { > >> dma_fence_chain_for_each(f, f) { > >> struct dma_fence *tmp = > >> dma_fence_chain_contained(f); > >> > >